json-ld / yaml-ld

CG specification for YAML-LD and UCR
https://json-ld.github.io/yaml-ld/spec
Other
19 stars 8 forks source link

Downgrading from Extended Internal Representation should use value objects #84

Open gkellogg opened 1 year ago

gkellogg commented 1 year ago

The spec currently downgrades to the standard representation by using strings, numbers, or booleans. In many cases, it should use Value Objects, instead.

Note that in rare events, it could create an unexpected result, such as a language-tagged string being used in a Language Map, which doesn't allow value objects.

pchampin commented 1 year ago

I don't quite follow what exactly the problem would be with language maps. Can you provide a small concrete example?

gkellogg commented 1 year ago
This issue was discussed in the 2022-10-12 meeting.

Subtopic: Extended Internal Representation yaml-ld#84

https://github.com/json-ld/yaml-ld/issues/84 -> Issue 84 Downgrading from Extended Internal Representation should use value objects (gkellogg) enhancement, spec
Gregg Kellogg: Current spec describes the extended internal representation. The motivation is: if, when parsing YAML-LD in extended mode, you have node tags then they can be passed through the JSON-LD algorithms without interpretation. Pierre-Antoine had another idea: add information into JSON objects and forcing algorithm to ignore that information. Pierre-Antoine referred to a related work by Niklas.
Gregg Kellogg: Niklas's work on the LDTR project: https://github.com/niklasl/ldtr
Pierre-Antoine Champin: I came over that only recently and it just rang a bell. For YAML-LD, another option would be to stick to existing internal representation — but we'll lose ability to round-trip from and to YAML-specific notations (tags to represent data types for instance).
Pierre-Antoine Champin: Extending the internal representation aims to convey YAML-specific extended syntactic constructs which don't exist in JSON.
Pierre-Antoine Champin: At TPAC, we mused with the idea that an extended representation can be extended to support Turtle and other RDF serializations.
Niklas Lindström: That's right. That was the point of my idea which I did twice. Originally, I wrote a very simple Turtle and TriG parser using a parser generator library for JS. I thought of it as of a teaching tool for the developers and metadata librarians.
Niklas Lindström: To some extent, it has worked like that. Then, I got carried away and did the same for RDF/XML as well. It's a bit dangerous as an idea because it is not what RDF is about. RDF is about semantics and triples.
Niklas Lindström: The attraction of JSON-LD is to get away from abstract model and to get into something concrete.
Niklas Lindström: Abstract syntax tree for RDF is a viable idea.
Gregg Kellogg: Is intermediate representation itself transcribable? It is printable for debugging purposes probably. but does it relate somehow into the extended JSON-LD representation, how does that match?
Niklas Lindström: JSON is a string representation but it is materialized in (especially dynamic) programming languages very similarly
Niklas Lindström: I didn't consider a formal internal representation at all when doing that
Niklas Lindström: The implementation in question: https://github.com/niklasl/ldtr
Gregg Kellogg: In SPARQL, in N3 my parser abstract syntax tree was forming S-expressions which are serializable in LISP-like fashion. This might be extended to other formats and be a way of expressing these internal formats. You want similar things everywhere: URIs, prefixes, IRIs, literals.
Gregg Kellogg: Having S-expression representation of RDF and then of SPARQL might be useful.
Gregg Kellogg: At RDF level, a triple is a fundamental building block. In SPARQL you also have operators. This doesn't deal with recursive statements though but still might be worth exploring.
Niklas Lindström: Was thinking of doing something similar for SPARQL.
gkellogg commented 1 year ago

In the call, I discussed my S-Expression representation of, for example, SPARQL. This is essentially the AST that comes from parsing the SPARQL grammar into a form that can be executed. For SPARQL, they're also known as SPARQL S-Expressions (SSE).

For example, the following SPARQL transforms to the subsequence S-Expression.

PREFIX : <http://bigdata.com/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex:  <http://example.org/>

SELECT ?age ?c WHERE {
   ?bob foaf:name "Bob" {| ex:certainty ?c |}.
}
(prefix (
  (: <http://bigdata.com/>)
  (foaf: <http://xmlns.com/foaf/0.1/>)
  (ex: <http://example.org/>))
 (project (?age ?c)
  (bgp
   (triple ?bob foaf:name "Bob")
   (triple (qtriple ?bob foaf:name "Bob") ex:certainty ?c)) ))

For RDF only representations, this would be limited to the prefix and base definitions, along with graphs/triples or quads.

While it does provide a nice intermediate representation, in the general case, it does not really support maintaining the original hierarchy from the documents, although we can imagine something that might, if we posit a statement operator whose components can be literals, IRIs, blank nodes, other statements, or components of statements (e.g., property list or object list). It does not support any kind of map structure.

It's probably too hypothetical and unknown to serve as an effective replacement for our existing (extended) internal representation.