kg-construct / rml-io

RML-IO: Input/Output declarations for RML
https://w3id.org/rml/io/spec
Creative Commons Attribution 4.0 International
1 stars 4 forks source link

Support YAML data #84

Open idomingu opened 3 months ago

idomingu commented 3 months ago

Hi,

I wonder how RML-IO could cope with datasets encoded in YAML format. YAML can be translated to JSON without losing information, so I think rml:JSONPath could be used as rml:ReferenceFormulation for YAML data.

In such case, the RML-IO spec should also mention YAML when using JSONPath, though it should also note that RML engines must internally translate YAML to JSON to make this possible.

Any thoughts?

Thanks!

dachafra commented 3 months ago

Maybe this issue is more a rml-io-registry issue than a rml-io one. @DylanVanAssche what do you think?

DylanVanAssche commented 3 months ago

While translating YAML to JSON works, isn't there a similar thing to JSONPath/XPath for YAML? If there is, then it is rml-io, if not, then it it rml-io-registry

idomingu commented 3 months ago

AFAIK there is no such thing as "YAMLPath". The best example I know about is Kubernetes API, which allows you to select data in the YAML manifest using JSONPath (see JSONPath Support). Maybe someone can shed some light here.

What's the purpose of rml-io-registry and how it relates to rml-io? In any case, I still think YAML should be mentioned in the RML-IO spec. Otherwise, it seems you cannot handle YAML data in RML.

DylanVanAssche commented 3 months ago

What's the purpose of rml-io-registry and how it relates to rml-io?

RML-IO puts its focus on Logical Source/Target and Source/Target of RML. In a Source/Target you have the access description while the Logical Source/Target contains the Source/Target as access description and the reference formulation and other things.

rml-io-registry aims to provide a detailed description for each data format on how to iterate over the data given a reference formulation and such. RML-IO is the abstraction while rml-io-registry puts the abstraction into reality for each data format e.g. SQL, JSON, XML, ... or YAML. The RML-IO spec aims to refer to this registry as you get now the impression that YAML ain't supported, which is what we want to avoid since RML-IO can support any data format, but requires a different access description, reference formulation, etc. depending on the data format at hand. As we cannot mention all existing and future data formats in a spec like RML-IO, we aim to move that to the registry so RML does not need to be revised at W3C each time a new format comes to light (like YAML here).

So I propose to define YAML and how to iterate over such data in the registry, we add then the reference formulation etc. in RML-IO as a possible reference formulation. How to do things in practice is then described in the registry for YAML.

VladimirAlexiev commented 2 days ago

I'm glad this is being discussed! There is also YAML-LD. The current spec "dumbs down" YAML to JSON, but we've also discussed leveraging YAML unique features. Eg for representing datatypes: issued: !xsd!date 2024-10-03 is much nicer than "issued": {"@type": "xsd:date", "@value": "2024-10-03"}

Perhaps RML should consider such special constructs