w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
109 stars 38 forks source link

Define rdf:type for nested objects #403

Closed weissjoh closed 1 year ago

weissjoh commented 1 year ago

Dear JSON-LD community,

I am trying to provide a JSON-LD for a given JSON-document (I cannot change the structure). In the given JSON-document there is a nested object structure. For further processing I do require do define the rdf:type for a nested object in the given JSON-document. I haven't found any strategy that allows me to define the type of the nested object within a JSON-LD context.

Example:

{
  "property1": "first",
  "property2": "second",
  "property3": {
    "propertyA": "A",
    "propertyB": "B"
  }
}

For the given JSON-document (without specialized processing upfront) I am only required to provide a JSON-LD context in advance for this document. This may look like this:

{
  "@context": {
    "property1": "myvocab:prop1",
    "property2": "myvocab:prop2",
    "property3": {
      "@id": "myvocab:hasBar",
      "@type": "myvocab:Bar",
      "@context": {
        "propertyA": "myvocab:valueOfA",
        "propertyB": "myvocab:valueOfB"
      }
    }
  },
  "@type": "myvocab:Foo",
  "property1": "first",
  "property2": "second",
  "property3": {
    "propertyA": "A",
    "propertyB": "B"
  }
}

See also JSON-LD Playground-Link here.

As a result I do get the following graph back:

_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <myvocab:Foo> .
_:b0 <myvocab:hasBar> _:b1 .
_:b0 <myvocab:prop1> "first" .
_:b0 <myvocab:prop2> "second" .
_:b1 <myvocab:valueOfA> "A" .
_:b1 <myvocab:valueOfB> "B" .

I want to have the the following triple to be defined in advance:

_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <myvocab:Bar> .

It seems that I am using the "@type": "myvocab:Bar"-definition wrong in the nested term definition. According to the specification the @type is only allowed there to define the concrete data type of data properties. So far I haven't found any solution of how to define the @type for object properties I do have in my case.

Can please someone has an idea of how to define a JSON-LD context for the given JSON-document so that I do get back the following RDF graph as a result?

Expected result:

_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <myvocab:Foo> .
_:b0 <myvocab:hasBar> _:b1 .
_:b0 <myvocab:prop1> "first" .
_:b0 <myvocab:prop2> "second" .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <myvocab:Bar> .
_:b1 <myvocab:valueOfA> "A" .
_:b1 <myvocab:valueOfB> "B" .

Any help is very much appreciated.

AtesComp commented 1 year ago

This use case has been brought up many times to the JSON-LD designers. The consensus is that JSON-LD itself cannot "add information" to a graph through a context. Otherwise, ....unspecified security concerns... in other use cases.

However, they were to address such issues with the JSON-LD Framing specification. However, I've had very little success in the past using that spec to actually accomplish that task. The spec claims that it is possible (or did at one point) but the playground would not produce the proper results at the time.

Actually, since you will likely need to craft a preprocessor to add the context and process the JSON-LD into your store anyway, it's easy to craft that preprocessor to type the structures directly as you identify them with the same pattern your attempting to set up in the context. This is easily done using a temp store to hold the raw (or partially transformed) graph data and, depending on your complexities, one or more CONSTRUCT SPARQL queries to transform the data into the final form.

I find this useful since it can serve as a data cleaning and validation stage to report on any structures that cannot be processed.

weissjoh commented 1 year ago

Thanks for quick answer on this @AtesComp

The consensus is that JSON-LD itself cannot "add information" to a graph through a context. Otherwise, ....unspecified security concerns... in other use cases.

I have seen this message several times that the context should not add information. I cannot really understand this. By defining a sub-context you already say if some property of a JSON-document is a data property or an object property. If it is an object property you can define with @id the predicate to be used for this relation. Hence this is already some "information add". For this case having the possibility to define the rdf:type of the object in the triple is nothing else. (If the vocabulary is maintained well you could even use the rdfs:range to find out the type implicetely; however, this requires some post processing).

So where to draw the border of what "information add" is 'acceptable' and what not? I think this needs to be re-thought in some way.

However, they were to address such issues with the JSON-LD Framing specification...

Do you have some reference for me so that I can take a look at that? If there is a standard way to go on defining this, I can adapt/tweak my tooling on that. As you said I anyway need to provide some pre- or post-processing for it. So doing it according to some way of standard would help on the mid- to long-term.

gkellogg commented 1 year ago

The strategy for doing this using framing was described here https://github.com/w3c/json-ld-syntax/issues/76#issuecomment-569367624. If it doesn’t work in the playground, it may be due to a bug in jsonld.js, which could certainly be addressed by the implementers, or through a PR.

simonstey commented 1 year ago

If it doesn’t work in the playground, it may be due to a bug in jsonld.js, which could certainly be addressed by the implementers, or through a PR.

https://tinyurl.com/2ny68jnm seems to work in the playground now

weissjoh commented 1 year ago

I was able to provide the rdf:type of the myvocab:Bar using the framing approach. Please see the example on the JSON-LD Playground.

Can somebody tell me why I need to define the "@id": "myvocab:hasBar both at the context of the JSON-LD input AND within the frame?

image

By having this definition added twice it is getting more difficult to keep both documents (context and frame) in sync. If there is one typo on one side, the whole processing will fail.

TallTed commented 1 year ago

If there is one typo on one side, the whole processing will fail.

That would seem to be desirable...

Typos are undesirable, and should be flagged and corrected as early as possible, which might well be through this processing failure.

gkellogg commented 1 year ago

The frame document is serving two purposes. One, to provide the template for actually framing the document, and two, for compacting the resulting document using the contained context (which is also used for expanding the framing document).

In this case, you have a term defined for "property3", which is how it is interpreted in the frame, and compacted in the result.

Note that contexts can be remote, so if the context in your input example were at another location, it could be referenced in both the input document and frame using something like {"@context": "location-of-shared-context"}, which would avoid needing to repeat the context in the frame document, itself. Depending on how you construct your frame, this could also be done programmatically in line.

weissjoh commented 1 year ago

Note that contexts can be remote, so if the context in your input example were at another location, it could be referenced in both the input document and frame ...

Thank you very much for pointing this out, @gkellogg . This is a solution I can live with 👍

weissjoh commented 1 year ago

I will close this issue as questions has been answered. The JSON-LD community may take this conversation up as feedback for development of future versions of JSON-LD. For now, nothing left out in this thread.