m-ld / m-ld-spec

Platform-independent m-ld specification
https://spec.m-ld.org/
MIT License
24 stars 1 forks source link

Use of vocabulary-scoped identifiers in `@values` #77

Closed Peeja closed 2 years ago

Peeja commented 3 years ago

This looks like a bug to me, but I also could easily be misunderstanding the intended semantics. I'm not sure how to characterize it, so by all means, please retitle this issue if you can put it more usefully. 😃


Given the following data:

[
  {
    "@id": "http://www.example.com/foo",
    "aProperty": "existing-value"
  },
  {
    "@id": "http://www.example.com/bar",
    "aProperty": "another-existing-value"
  }
]

I expected the query:

{
  "@construct": {
    "@id": "?id",
    "?property": "?value"
  },
  "@where": {
    "@graph": {
      "@id": "?id",
      "?property": "?value"
    },
    "@values": [
      {
        "?id": "http://www.example.com/foo"
      }
    ]
  }
}

to return:

[
  {
    "@id": "http://www.example.com/foo",
    "aProperty": "existing-value"
  }
]

but instead it returns []. (playground)

A similar @select query also yields []. (playground)

For what it's worth, an empty @values successfully returns all of the data. (playground)


The use case here is that I'm trying to read all of the existing triples for properties that I'm about to write, so that I can @delete the old values when I @insert the new values. I should be able to do the same thing with a bunch of reads and put them together, but that seems odd (but might be what I'll do for now, at least).

gsvarovsky commented 3 years ago

The problem is that a plain string like "http://www.example.com/foo" is interpreted to be a literal, which will never match an @id. Instead, you need to use an explicit Reference, like this:

...
    "@values": [{
      "?id": { "@id": "http://www.example.com/foo" }
    }]
...

There are only a few contexts in which a plain string will be interpreted as an IRI:

In some cases you can also use a @context to make something an IRI when it's within a Subject. But that never applies to @values, because it supports the bound variables appearing anywhere else in the rest of the @where clause.

Let me know if this makes sense and fixes the issue for you. In the meantime let's leave the ticket open while we decide the best way to emphasise this in the documentation.

Peeja commented 3 years ago

Ah, of course. That works great.

Okay, round 2: I'll need to also match the property name with @values, and those are also IRIs. But where a string aSubject given as an @id is coerced to http://the.domain.example/aSubject, and therefore can be seen and used in relative form as aSubject, a string value aProperty given as a property (that is, as an object key) is coerced to http://the.domain.example/#aProperty, which means its relative form is #aProperty. That makes it hard to work with, because I can't just put { "@id": theKeyFromTheObject } in the @values: it doesn't have the #. But I also can't say { "@id": `#${theKeyFromTheObject}` }, because that would break if theKeyFromTheObject were already an explicit, absolute IRI.

Is there a facility for normalizing an object by applying the context that m-ld will be using? That would probably be easier to work with.

gsvarovsky commented 3 years ago

Ah, that's awkward. Leakage of the # into the API is definitely a Bad Thing, especially for users without prior RDF experience. I would even prefer that it's not necessary to know the difference between document- and vocabulary-scoped identifier positions. Ideas:

  1. Facility to normalise (expand) a Subject. This would be a double-spend, as m-ld is already denormalising (compacting) it, so:
  2. Facility to disable context-based compaction for read results. This would help in your scenario but would hurt anyone who was using theKeyFromTheObject to do anything in the application, like adjusting an in-memory representation; especially if they didn't generally care about IRIs.
  3. A syntax for specifying vocabulary expansion in @values and other references, for example using { "@vocab": theKeyFromTheObject }. This would be kinda neat. It violates my second API principle but it's relatively easy to explain: "do this for properties and types". It does leave some other awkward edge cases though, like matching a variable in both a document and a vocabulary position. Edit: Wait, no, that would be fine, I think

Hmm

Peeja commented 3 years ago

Ah, okay. I think what I was missing was the different meanings of @base and @vocab in JSON-LD.

What I'm working on here is a function upsert() which takes Subject[] and returns a StateProc which performs an upsert for properties which should be single-valued, deleting any existing values when the new ones are inserted. To do that, I have to take property names from the property position and use them in the object position in @values. Now that I get how to do that, I see that I can actually get it done in a single write:

await state.write({
  "@insert": subjects,
  "@delete": { "@id": "?id", "?property": "?value" },
  "@where": {
    "@graph": {
      "@id": "?id",
      "?property": "?value",
    },
    "@values": [
      {},
      ...subjects.flatMap((subject) =>
        Object.keys(subject)
          .filter((key) => key != "@id")
          .map((key) => ({
            "?id": { "@id": subject["@id"] },
            "?property": { "@id": `#${key}` },
          })),
      ),
    ],
  },
});

But I still need to translate from @vocab-relative to @base-relative (which I've temporarily done here by assuming the default context). Ideally, I'd be able to use the same logic m-ld will be using to translate that name. I think ultimately that means (essentially) expanding the subjects, no? I'm not sure how idea 2 would solve that, and I don't quite follow idea 3.

Edit: Actually, it looks like the single write isn't working as well as I thought it would, so I'm back to a read and then a write. But in any case, the relevant point stands.

gsvarovsky commented 3 years ago

Option 3 replaces

"?property": { "@id": `#${key}` },

with

"?property": { "@vocab": key },

The @vocab key replaces the @id key of a Reference but tells the processor to use vocabulary resolution for it.

This actually offers a fix for the equivalent problem in JSON-LD.

gsvarovsky commented 3 years ago

single write isn't working as well as I thought it would

Somewhere along the way this query has lost its @union, which means the @graph results (everything in the domain!) are being joined with the @values, which will drop the empty binding {}.

For casual observers, see https://github.com/m-ld/m-ld-spec/issues/76#issuecomment-924669573

gsvarovsky commented 3 years ago

Suggested option https://github.com/m-ld/m-ld-spec/issues/77#issuecomment-929904927 is now available on the edge:

Peeja commented 3 years ago
"?property": { "@vocab": key },

Ah! I'm following now. Yep, I like that too!

Somewhere along the way this query has lost its @union, which means the @graph results (everything in the domain!) are being joined with the @values, which will drop the empty binding {}.

Yes, sorry, I should have been clearer there: I'm specifically trying to avoid this @union, because it grows with the size of the input—then sparqlalgebrajs turns it into a linked list, and then someone (not sure where it is) traverses that recursively, which blows the stack for large inputs (such as my initial data import in my first write()). I switched to a read() followed by a write() to avoid that, and then thought I could still avoid it with the write() above, but I was looking at the wrong tests when I thought it was working.

gsvarovsky commented 3 years ago

... which blows the stack for large inputs

I created another issue for this. In the meantime, hopefully the need for the @union or additional reads improves with the change of behaviour for @delete-with-variables plus @insert-without-variables, https://github.com/m-ld/m-ld-spec/issues/76#issuecomment-930051196