json-ld / yaml-ld

CG specification for YAML-LD and UCR
https://json-ld.github.io/yaml-ld/spec
Other
22 stars 8 forks source link

Define anchor usage in yaml-ld #13

Open ioggstream opened 2 years ago

ioggstream commented 2 years ago

As an json-ld editor … WHO I want to use yaml anchors … WHAT So that I can easily reuse content … WHY

Note

The specification should define:

example 1

---
- "@id": &homer http://example.org/#homer  # Anchor the homer url
  http://example.com/vocab#name:
  - "@value": Homer
- "@id": http://example.org/#bart
  http://example.com/vocab#name:
  - "@value": Bart
  http://example.com/vocab#parent:
  - "@id": *homer                               # reuse the anchor instead of re-typing the homer url
- "@id": http://example.org/#lisa
  http://example.com/vocab#name:
  - "@value": Lisa
  http://example.com/vocab#parent:
  - "@id": *homer

example 2

Using anchor and alias nodes https://gist.github.com/ioggstream/31f3226fa9976b3baf0800f44bc19c98

VladimirAlexiev commented 2 years ago
pchampin commented 2 years ago

One point where I believe YAML anchors can help are the description complex of contexts. E.g.

{
  "@context": {
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "@vocab": "http://example.com/ns/Company/",
    "founder": { "@context": {
        "@vocab": "http://example.com/ns/Person/",
        "birthDate": { "@type": "xsd:date" }
    }},
    "employee": { "@context": {
        "@vocab": "http://example.com/ns/Person/",
        "birthDate": { "@type": "xsd:date" }
    }}
  }
}

Notice that the scoped contexts of founder and employee are exactly the same (a "person" context). With Yaml anchors, this redundancy could be elimiinated.

NB: there are other means to get rid of this redundancy in pure JSON-LD:

but they have their drawbacks that are not always acceptable.

ioggstream commented 2 years ago

That's exactly the kind of discussions and examples we need :)

"@context":
  xsd: http://www.w3.org/2001/XMLSchema#
  "@vocab": http://example.com/ns/Company/
  founder:
    "@context": &person-context
      "@vocab": http://example.com/ns/Person/
      birthDate:
        "@type": xsd:date
  employee:
    "@context": *person-context
VladimirAlexiev commented 2 years ago

how Anchors and Aliases could mesh with JSON-LD Frames

Frames specify which nodes to expand, and which nodes to merely refer to by URI. So in some sense they tackle the "graph vs tree" problem.

Anchors and Aliases tackle the same problem; intuitively I feel in a more general way.

So: what can be the connection between them?

anatoly-scherbakov commented 2 years ago

I am not entirely clear on how anchors would actually affect the LD part of the picture. Having a YAML document with anchors, we're going to convert it to JSON — and in that conversion, the anchors will be resolved. Thus, a JSON-LD processor that we will subsequently use won't know anything about those anchors.

This is similar to C preprocessor directives which are resolved before the source file is consumed by the compiler itself.

Is that right? If yes, can't we safely ignore these particular YAML features relying upon YAML spec to describe them?

gkellogg commented 2 years ago

Of course, JSON-LD does encode a graph in JSON; what used to be called a node reference is of the form {"@id": "..."}. Framing has an @embed keyword that can control how this works with one or all instances of a node referenced either fully or as a reference.

The YAML anchor/alias mechanism is similar the the framing keys, and also similar in concept to the @included keyword.

For now, I think we need to be cautions on depending on any YAML features beyond JSON re-serialization until we understand the requirements for round-tripping. a YAML-LD extended profile could allow us to move beyond what can easily be represented in JSON-LD, and we need to tread carefully.

VladimirAlexiev commented 2 years ago

Anchors can be used to define fragment IDs inside YAML instance data, like attributes @id and href/@name do in HTML.

@ioggstream where was your proposal for such fragments? In addition to anchors, it used JSON Path to address any element in the JSON/YAML structure.

Eg if at https://example.com/TheSimpsons.yaml we have:

*Bart:
  name: Bart Simpsons
  gender: male

Then the alias would be resolved to https://example.com/TheSimpsons.yaml#Bart

The same in plain YAML-LD would look like this:

- "@id": Bart
  name: Bart Simpsons
  gender: male

--

@anatoly-scherbakov basically says that anchors/aliases must be resolved by the YAML processor and elided, i.e. anchors can only be used locally inside one file. Furthermore, the shared info must be copied out during the resolution. I like @pchampin's concrete example of using aliases to express a context more economically. But being a graph person, I dislike expanding shared graph structures by copying them out.

--

If anchor-based data sharing is necessarily local (limited to one file), then perhaps we can use it at least for blank nodes and avoid copying? Eg

valve1:
  temperature: *temp100C
    value: 100
    unit: degC
valve2:
  temperature: &temp100C

Should result in this turtle

<valve1> :temperature _:temp100C.
<valve2> :temperature _:temp100C.
_:temp100C :value 100; :unit <degC>.

and NOT this one:

<valve1> :temperature [:value 100; :unit <degC>]
<valve2> :temperature [:value 100; :unit <degC>].
ioggstream commented 2 years ago

@VladimirAlexiev let me try to clarify your examples:

Syntax tweak. A keyword cannot start with *; Anchor is attached to a node.

Bart: &BartSimpsons  #  create an anchor to this node.
  name: Bart Simpsons
  gender: male

I don't think that this can implicitly map to a @id: Bart because Anchors are a serialization details. The above document can be legitimately be serialized as

Bart: &anchor001  #  create an anchor to this node.
  name: Bart Simpsons
  gender: male

Homer:
  children:
  - *anchor001  # An Alias references an anchor.

Representation graph

iiuc the yaml below

t100: &t100 100
valve1:
  temperature: &temp100C
    value: *t100
    unit: degC
valve2:
  temperature: *temp100C

maps to the following YAML rep. graph

graph LR;
  root --> t100 & valve1 & valve2
  t100 --> 100
  valve1 --> temperature1[temperature] -->temp100C --> value & unit
  value --> t100
  unit --> degC
  valve2 --> temperature2[temperature] -->temp100C

The first question I asked myself is: how do pyyaml process this information?.

pyyaml preserves reference when parsing mutable structures to a dict()

temperature = yaml.safe_load(temperature_yaml)  # see doc above
assert temperature['valve1']['temperature']['value'] == 100
assert temperature['valve2']['temperature']['value'] == 100
# assign a new temperature
temperature['valve1']['temperature']['value'] = 200
assert temperature['valve2']['temperature']['value'] == 200  # Changed.

but acting on an immutable structure, things changes

assert temperature["t100"] == 100
assert temperature['valve2']['temperature']['value'] == 100
temperature["t100"] = 200
assert temperature['valve2']['temperature']['value'] == 100  # Not changed.
VladimirAlexiev commented 2 years ago

Sharing and Cycles (Frames)

Frames are quite key because they define what part of an RDF graph and how to unroll it to a JSON tree.

@gkellogg in #44

The JSON-LD Framing algorithm is quite complicated as it is.

Agreed, and I don't even know it properly. Of course, we'll use it whole-cloth without modification.

But I intuitively feel that anchors may have something to do with Frames because both address (to some degree) the problem "given a graph, how to serialize part of it as a tree". Both allow to share objects and handle cycles (to avoid infinite embedding), but:

Modularity/Structuring

@pchampin

anchors can help in the description of complex contexts

JSON Schema has special modularity/structuring facilities, see https://json-schema.org/understanding-json-schema/structuring.html

So the question of YAML fragments and pointers, and how they relate to Schema fragments and JSON Pointers, is key. @ioggstream has been struggling with this problem: please take charge of this, keep up the fight, and we'll help as much as we can!

Syntax tweak

Thanks!

Representation graph

Yes, but the alias "nodes" t100, temp100C are quite different from the others because they carry no info and instead are just redirection pointers (so maybe use a different color).

gkellogg commented 2 years ago

This issue was discussed on the Aug 03 meeting.