kg-construct / rml-io

RML-IO: Input/Output declarations for RML
https://w3id.org/rml/io/spec
Creative Commons Attribution 4.0 International
1 stars 4 forks source link

granularity of target attachment #66

Open VladimirAlexiev opened 6 months ago

VladimirAlexiev commented 6 months ago

Are there real use cases where different triples need to be directed to different targets? https://kg-construct.github.io/rml-resources/portal/requirements/requirements-io.html doesn't describe such. What is the value of saving some triples of a subject to a file, and other triples to a SPARQL endpoint?

https://kg-construct.github.io/rml-io/spec/docs/#multiple-targets describes endless combinations of targets attached to different levels, but are they useful?

Furthermore, while a quad can go into multiple targets, the entirety of a quad must go to one target: you cannot store its components <g,s,p,o> and o=<value,lang,datatype> in different targets.

If it's a valid case to specify target per languageMap, is it not also valid to specify it per datatypeMap?

So I question the accuracy of statements like this: https://kg-construct.github.io/rml-io/spec/docs/#language-and-graph-map

All triples containing the language tag en are exported to TargetDump1 and all triples within the named graph ex:Characters are exported to TargetDump2.

Consider the map:

  rml:predicateObjectMap [ a rml:PredicateObjectMap;
    rml:graphMap [ a rml:GraphMap;
      rml:logicalTarget <#TargetDump2>;
      rml:constant ex:Characters;
    ];
    rml:predicateMap [ a rml:PredicateMap;
      rml:constant foaf:name;
    ];
    rml:objectMap [ a rml:ObjectMap;
      rml:reference "name/text()";
      rml:languageMap [
        rml:logicalTarget <#TargetDump1>;
        rml:constant "en";
      ];
    ];
  ];

It makes quads with g=ex:Characters, p=foaf:name, lang=@en. The graphMap and languageMap set these quad components, they don't test them. So all these quads go to TargetDump1 and TargetDump2: the quoted sentence is confusing since it implies that different sets of triples go to different targets.

Then wouldn't it be better to put the targets at the predicateObjectMap level to make this more clear?

  rml:predicateObjectMap [ a rml:PredicateObjectMap;
    rml:logicalTarget <#TargetDump1>, <#TargetDump2>;
    rml:graphMap [ a rml:GraphMap; rml:constant ex:Characters; ];
    rml:predicateMap [ a rml:PredicateMap; rml:constant foaf:name; ];
    rml:objectMap [ a rml:ObjectMap;
      rml:reference "name/text()";
      rml:languageMap [rml:constant "en"; ];
    ];
  ];

Last but not least, it should be possible to set the target at the level of TripleMap to cater for the most common case.


In summary, I propose to set targets at TripleMap and predicateObjectMap levels, but not at subjectMap, predicateMap, objectMap, graphMap, languageMap.

DylanVanAssche commented 6 months ago

Thanks for the interesting issue! This is the feedback we really want to see :)

Are there real use cases where different triples need to be directed to different targets?

Yes, you can make then materialized views of the RDF graph depending on the different purposes you want to use it for. Examples: separate by language, store sensitive data in a separate target with higher security level access requirements, etc.

https://kg-construct.github.io/rml-resources/portal/requirements/requirements-io.html doesn't describe such.

We should that make more clear :+1:

What is the value of saving some triples of a subject to a file, and other triples to a SPARQL endpoint? One common use-case is keeping backups: your SPARQL endpoint can be queried live while the file is a backup of the current version in case your infrastructure goes down or you want to exchange your RDF graph as a dump with other parties without putting the pressure on your SPARQL endpoint to dump everything each time the other party needs a new version.

https://kg-construct.github.io/rml-io/spec/docs/#multiple-targets describes endless combinations of targets attached to different levels, but are they useful?

We could reduce the length of the spec there by combining some examples but a spec should list all possible examples to make it complete for developers to have no doubt what should happen in a certain combination IMO.

Furthermore, while a quad can go into multiple targets, the entirety of a quad must go to one target: you cannot store its components <g,s,p,o> and o=<value,lang,datatype> in different targets.

Yes, you always store full RDF quads or triples otherwise you cannot query it later or parse it with existing tools & libraries.

If it's a valid case to specify target per languageMap, is it not also valid to specify it per datatypeMap?

Datatype map is also allowed, this map did not exist before but now it does in Core, we should adjust RML-IO to also explicitly allow it.

the quoted sentence is confusing since it implies that different sets of triples go to different targets.

Okay, good point, we need to improve that sentence then since it confuses.

Then wouldn't it be better to put the targets at the predicateObjectMap level to make this more clear?

Then we have only granularity on a graph level. Then you can only store on the same level as a named graph.

Last but not least, it should be possible to set the target at the level of TripleMap to cater for the most common case.

Why should we have this? Adding it in the Subject Map does exactly this.

VladimirAlexiev commented 6 months ago

Examples: separate by language

A languageMap sets a language (constant or from source data), and it cannot separate by language unless you use some complex conditionals. Is it worth the trouble?

Or consider this real case of graphMap: our CrunchBase RDFization saves the triples from each row of each table into a separate graph to enable SPARQL Update scenarios. That's 12M graphs that come from 18 tables that are RDFized to 15 nodes (Orgs are fed from 3 tables, Persons from 2 tables). What is a reasonable split into files, and how would you describe it? Target URLs are always constant, they cannot come from the data.

(not TripleMap) Adding it in the Subject Map does exactly this.