json-ld / ndjson-ld

Other
1 stars 0 forks source link

Blank Node Scope #4

Open gkellogg opened 1 year ago

gkellogg commented 1 year ago

On the call, I think we discussed that blank nodes would be scoped to each individual record, however this would make using this format in a manner similar to N-Quads unable to represent shared blank nodes.

Most serializations scope blank node identifiers to the file. In this reckoning:

{"@id": "foo", "prop": {"@id": "_:bar"}}
{"@id": "_:bar", "prop": {"@id": "baz"}}

could be semantically equivalent to:

{"@id": "foo", "prop": {"@id": "_:bar", "prop": {"@id": "baz"}}}

Otherwise, there would be no way to state this using independent records.

Proposal: blank node labels are shared across all documents within a stream.

gkellogg commented 1 year ago
This issue was discussed in the 2022-10-12 meeting

Subtopic: Blank Node Scope

Gregg Kellogg: If every record is a JSON-LD document, it might have its own context. Or it can be provided externally — as an HTTP header, or an API parameter
... Now, regarding the blank nodes scope. We've discussed it for YAML-LD before.
Gregg Kellogg: Tag definitions and blank node names might be independent between documents. For instance, if we're collecting random documents we might have unexpected consequences if the labels or tags overlap. In the streaming applications though, we might want to share labels (for computing differences, etc) but we won't be able to.
Gregg Kellogg: Next steps on NDJSON-LD — Nicholas?
Niklas Lindström: I do not lack anything other than time. I should write the simplest thing imaginable first, probably... Blank nodes: I do not have a strong opinion on this, we're not using blank node identifiers. JSON-LD document can be an RDF dataset and that is an interesting complication. So every line can represent multiple datasets/graphs.
Niklas Lindström: This does not influence blank nodes question though.
Niklas Lindström: In TriG documents, afaik blank node ids are shared throughout the whole document.
Gregg Kellogg: Different use cases might drive conflicting requirements.
Leonard Rosenthol: There is a difference whether we address a homogenuous case (all documents share a context or grammar) vs the case where documents are heteregenuous (potentially unrelated to each other) - but they happen to share one data stream. Do we want to solve both cases? What are we gaining or losing by doing so?
Gregg Kellogg: NDJSON-LD should be an extension of JSON-LD API because YAML-LD calls upon it. Algorithms operate to transform → JSON-LD internal representation. Consequently, we rely upon that. Regarding the purpose of all this: if we pre-suppose an API and one of this API entry points relates to RDF transformation then doing something multi-dataset becomes really challenging.
Gregg Kellogg: Especially when we think we're collecting unrelated and maybe conflicting records. It would simplify the problem if all the records relate to a single dataset. Unless we have compelling use cases which suggest otherwise.
Gregg Kellogg: We might introduce a "meta" record to specify meta parameters. Like Turtle with their `"@prefix"` parameters in the header.
Niklas Lindström: +1 For "meta-records"; could e.g. set the context initially (corollary: first row of a TSV as columns)
Niklas Lindström: Going to continue the work on NDJSON-LD.
gkellogg commented 1 year ago

In the meeting, we discussed the possibility of some kind of meta-record that could better control blank-node scope. This could be some kind of record with a directive that resets the blank node namespace between records, or records that form a header to the stream document itself that control such behavior for all records.

%meta bnode-space disjoint
{"@id": "_:b0", "p1": "v1"}
{"@id": "_:b0", "p2": "v2"}

In which the two records do not share the same blank node, even though the use the same identifier, or:

%meta bnode-space shared
{"@id": "_:b0", "p1": "v1"}
{"@id": "_:b0", "p2": "v2"}

In which both records share the same blank node. Perhaps something like %meta bnode-space reset might be used between records to sever the shared space.

For JSON-LD, this would only really come into play when generating RDF, or flattening when blank nodes are renamed, and would be difficult to communicate through the API.

Similar thoughts about if each document describes its own disjoint dataset, or they are all considered sub-sets of a single dataset.

Easiest course would be to say that the stream format is intended for documents that all relate to the same dataset and blank node identifiers are shared across all documents within a stream.

TallTed commented 1 year ago

@gkellogg — I think you intended for %meta bnode-space disjoint to mean "the two records use the same label for different blank nodes, while %meta bnode-space shared means "the two records use the same label for the same blank node"? If not, I'm entirely confused by what the two codeblocks are meant to mean.

gkellogg commented 1 year ago

Yes, something got lost in the original, I’ve updated the comment.

anatoly-scherbakov commented 1 year ago

This format would require a separate definition for the meta record grammar.

Can this be a JSON record? Even more, can we have a JSON-LD context to define its meaning?

{"@context": "https://ndjson-ld.org/meta", "blank-nodes-scope": "global"}