json-ld / yaml-ld

CG specification for YAML-LD and UCR
https://json-ld.github.io/yaml-ld/spec
Other
22 stars 8 forks source link

Round-trip safe json-ld -> yaml-ld -> json-ld #8

Closed ioggstream closed 2 years ago

ioggstream commented 2 years ago

As an <user with json-ld files> … WHO I want to <convert them to yaml-ld> … WHAT So that <they are round-trip safe> … WHY

Note

imho any other behavior hinders interoperability

nichtich commented 2 years ago

What do you meant by round-trip safety? Given a YAML-LD document Y1 the YAML-LD specification will define a transformation to a corresponding JSON-LD document J1. The transformation will unlikely be bijective so another documents Y2 may exist being transformed to J1 as well.

I suppose transformation from JSON-LD to YAML-LD is out of the scope of YAML-LD specification anyway, isn't it? J1 could be expressed in Y1, Y2... as you like as long as these transform to J1. There may also be JSON-LD features not supported by YAML-LD (to be discussed) because they are semantically irrelevant, should these be preserved as well?

As far as I understand round-trip safety, the only way to formally tackle it is to define canonical document forms.

gkellogg commented 2 years ago

If it's any different than JSON.parse(File.read("file.jsonld")).to_yaml or YAML.load("file.ymld").to_json then we've probably over complicated it, aside from some potential keyword transformations and magic-key insertion.

Canonical forms require the use of a canonicalization algorithm, as is defined for JSON-LD in the spec. I'm not aware of a similar algorithm for YAML, but it could likely use the same logic.

I think the way to look at round-tripping is that parsing YAML-LD or JSON-LD documents to the internal representation should produce equivalent internal representations, which leaves out canonical serialized forms, document ordering (except as required), and keyword transformations.

ioggstream commented 2 years ago

There may also be JSON-LD features not supported by YAML-LD (to be discussed) because they are semantically irrelevant, should these be preserved as well?

Not a YAML expert here, but since YAML data types are wider than json ones, And YAML representation graph is a direct graph potentially with cycles - while json is just a tree, I fail at identifying a json-ld feature that is not supported in YAML -ld

I think we should anyway state that yaml-ld MUST extend json-ld features.

Agree with @gkellogg :

  1. yaml-ld must support yaml.dump ( json.load ( json_text))

  2. If the YAML representation graph is acyclic, JSON.dump(YAML.load(yaml_string)) MUST be a valid json-ld equivalent to the original document modulo a well defined relation.

anatoly-scherbakov commented 2 years ago

If to use the @/$ conversion (#11), and if the user defines aliases like

{
  "@context": {
    "$id": "@id"
  }
}

(idea © #9) — then, after a naive conversion to YAML-LD and then back to JSON-LD we will see

{
  "@context": {
    "@id": "@id"
  }
}

which is not even a valid JSON-LD because

keywords cannot be overridden

Knowing that, we can implement this as an edge case where $ is not replaced by @ if it is a key which is being overridden. I am thinking that we can detect such keys as direct descendants of a @context; my understanding of JSON-LD spec is not 100%. If there are any other cases, I would be happy to learn about them in the discussion for #11.

gkellogg commented 2 years ago

This was discussed during today's call: https://json-ld.org/minutes/2022-06-22/.

VladimirAlexiev commented 2 years ago

@ioggstream Round-trippability can be understood at different levels:

I fail at identifying a json-ld feature that is not supported in YAML-LD we should state that yaml-ld MUST extend json-ld features.

Absolutely. Because YAML is a super-set of JSON, we have one round-trip that I call the "default case":

There's no question it's the most important base case, and we should agree that once and for all, and move on to discussing YAML extensions, options and fringe cases, because that's where the meat is. I.e., the default case is trivial and well-understood.

@gkellogg, does that allay your concerns, or am I under-estimating the default case?

I think the way to look at round-tripping is that parsing YAML-LD or JSON-LD documents to the internal representation should produce equivalent internal representations

Agree! Does JSON-LD define such internal representation, where is it, Or is RDF that internal representation?

@nichtich your thoughts on bijectivity etc are relevant

I suppose transformation from JSON-LD to YAML-LD is out of the scope of YAML-LD specification anyway, isn't it?

In the contrary, it's part of the "default case" thus very important. It's also trivial since JSON-YAML is well-known. So it can be a just a paragraph in the spec.

@anatoly-scherbakov

not even a valid JSON-LD because "keywords cannot be overridden"

That's not right for two reasons:

1: if we accept #51, one can use it to effect uniform keyword aliasing in YAML like this:

"@context":
  $id: @id
  $type: @type
  $value: @value
  # etc

This will result in JSON like this

"@context": {
  "$id": "@id",
  "$type": "@type",
  "$value": "@value"
}

That's not the degenerate form you've shown.

2: Where do you read "keywords cannot be overridden"? In https://w3c.github.io/json-ld-syntax/#aliasing-keywords I read "Since keywords cannot be redefined, they can also not be aliased to other keywords".

gkellogg commented 2 years ago

Because YAML is a super-set of JSON, we have one round-trip that I call the "default case":

  • JSONLD-YAMLLD-JSONLD

There's no question it's the most important base case, and we should agree that once and for all, and move on to discussing YAML extensions, options and fringe cases, because that's where the meat is. I.e., the default case is trivial and well-understood.

@gkellogg, does that allay your concerns, or am I under-estimating the default case?

That's pretty much my view.

I think the way to look at round-tripping is that parsing YAML-LD or JSON-LD documents to the internal representation should produce equivalent internal representations

Agree! Does JSON-LD define such internal representation, where is it, Or is RDF that internal representation?

JSON-LD defines the internal representation. While it could potentially be extended (e.g., different types of numbers), we'd need to carefully justify doing so.

@nichtich your thoughts on bijectivity etc are relevant

I suppose transformation from JSON-LD to YAML-LD is out of the scope of YAML-LD specification anyway, isn't it?

I would not say so. My view is the round-tripping means that there is no semantic loss in turning JSON-LD -> YAML-LD -> JSON-LD or visa-versa. Trying to get back to exact syntactic forms is a needless complication. Having both go through RDF would also be acceptable, and generating the results will of necessity involve applying contexts (compact form) or frames (framed form). It may be possible to reproduce the embedding structure without the use of framing in either direction that doesn't involve flattening or toRDF, but this is a nice-to-have artifact, not a requirement.

@anatoly-scherbakov

not even a valid JSON-LD because "keywords cannot be overridden"

That's not right for two reasons:

1: if we accept #51, one can use it to effect uniform keyword aliasing in YAML like this:

"@context":
  $id: @id
  $type: @type
  $value: @value
  # etc

This will result in JSON like this

"@context": {
  "$id": "@id",
  "$type": "@type",
  "$value": "@value"
}

That's not the degenerate form you've shown.

Yes. But, we can go overboard with trying to encourage the use of $ keywords when in many cases, using plain-word versions is preferable, ('e.g', id, type, ...).

2: Where do you read "keywords cannot be overridden"? In https://w3c.github.io/json-ld-syntax/#aliasing-keywords I read "Since keywords cannot be redefined, they can also not be aliased to other keywords".

  • So it's forbidden to write eg "@id": "@type"
  • But it's ok to write "@id": "@id" (not that I advocate it)

There are actually reasons for doing things like this, e.g., "@type": {"@id": "@type", "@container", "@set"}.

gkellogg commented 2 years ago

This issue was discussed in today's meeting.