json-ld / yaml-ld

CG specification for YAML-LD and UCR
https://json-ld.github.io/yaml-ld/spec
Other
19 stars 8 forks source link

Should output type of `expand()` be `dict` or `str`? #143

Open anatoly-scherbakov opened 2 months ago

anatoly-scherbakov commented 2 months ago

Context

The spec (in its non-normative part) says:

This specification extends the JSON-LD 1.1 Processing Algorithms and API [JSON-LD11-API] Application Programming Interface and the JSON-LD 1.1 Framing [JSON-LD11-FRAMING] Application Programming Interface to manage the serialization and deserialization of [YAML] and to enable an option for setting the YAML-LD extended profile.

Question

Should expand() output a dict, a native type, or a serialized YAML string?

gkellogg commented 2 months ago

Step 9 of the JSON-LD expand() API entry says the following:

Resolve the promise with expanded output transforming expanded output from the internal representation to a JSON serialization.

It is the internal Expansion Algorithm which returns a map or array, the API is responsible for coordinating internal calls and serializing (or deserializing) the internal representation.

anatoly-scherbakov commented 2 months ago

I see, I should have looked into the API spec once again befote asking. But, say, pyld doesn't do that, it returns a dict.

Does this mean it is not, in this particular aspect, conformant?

gkellogg commented 2 months ago

Something for @davidlehn to comment on. Serialization may be handled just outside the API call. Ruby is probably similar. But, the API definitions between JSON-LD and YAML-LD should be symmetric.

davidlehn commented 2 months ago

I may get lost in the nuances here. Is this asking if expand() and other calls should return a data structure or a serialized string? I see the JSON-LD API algorithms end with "[...] transforming [...] from the internal representation to a JSON serialization." I suppose that could be interpreted strictly as needing to serialize to a JSON string. Do implementations do that? I think pyld and jsonld.js return data structures for everything. I think due to the nature of JSON, people often mix what they mean when talking about "JSON" or the JSON data structures. From a programming view, the intent is usually to further process the data, so you want the data structure out of the calls, and you'd serialize to a JSON string later manually as needed.

I see the YAML-LD spec says to serialize to YAML instead: https://json-ld.github.io/yaml-ld/spec/#jsonldprocessor, https://json-ld.github.io/yaml-ld/spec/#conversion-to-yaml. In that case, it seems implementations would interpret that as needing to serialize into an actual YAML string. I'm a bit behind on understanding how YAML-LD works. Is the internal structure the same as in the JSON-LD case, or is it annotated somehow with more advanced YAML features? If you wanted to process the data structures, what would you do? Call a YAML-LD API call then reparse the plain YAML output into a JSON-like data structure? That seems a bit awkward.

So there may be a difference in the how the JSON-LD spec is worded and what is done in practice. And I can see how that is a bit more difficult to handle in the YAML-LD case. I'm not quite sure how to make these symmetric.

anatoly-scherbakov commented 2 months ago

From a programming view, the intent is usually to further process the data, so you want the data structure out of the calls, and you'd serialize to a JSON string later manually as needed.

I agree with the practicality of this approach.

I'm a bit behind on understanding how YAML-LD works. Is the internal structure the same as in the JSON-LD case, or is it annotated somehow with more advanced YAML features?

The Extended YAML Profile, which would call for use of advanced YAML features, is only described in an addendum to current spec: it is not, at this version, normative.

So what an implementation can do (say, what my python-yaml-ld.iolanta.tech does) is plainly to convert YAML into a data structure, pass that to JSON-LD library (like, I am basing upon pyld) and returning the processed result.

For me, motivation is exactly that:

Can we find a middle ground here between the practicality and our ability to normatively describe the specification?