zazuko / barnard59

An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
26 stars 2 forks source link

Raw RDF Loader #325

Open tpluscode opened 2 weeks ago

tpluscode commented 2 weeks ago

In issue #31 we added a p:FileContents loader which reads a file as text and returns the contents.

I propose a similar loader which would load, parse and return a dataset. Optionally with the code:imports resolved.

[
  a code:RDFDocument ;
  code:link <file:../shapes.ttl> ;
  dcterms:format "text/turtle" ; # optional
  code:imports true ; # optional
]

I kinda overload the semantics of code:imports but I find this quite understandable this way.

Such a loader could already be useful in conjunction with with the SHACL step. For example, here's how the b59-cube check-metadata pipeline would change

<check-metadata> a p:Pipeline , p:Readable ;
  p:variables [ p:variable _:profile, _:profileFormat ] ;
  p:steps
    [
      p:stepList (
        [ base:stdin () ]
        [ n3:parse () ]
        [ rdf:getDataset () ]
-       [ shacl:report (_:getProfile) ]
+       [ shacl:report ( [
+         a code:RDFDocument ;
+         code:link "profile"^^p:VariableName ;
+         dcterms:format "profileFormat"^^p:VariableName ;
+         code:imports true ;
+       ] ) ]
        [ base:flatten () ]
        [ ntriples:serialize () ]
      )
    ]
.

-_:getProfile a p:Pipeline , p:ReadableObjectMode;
- p:steps
-   [
-     p:stepList
-       (
-         [ rdf:open ( "profile"^^p:VariableName "profileFormat"^^p:VariableName ) ]
-         [ rdf:transformCodeImports () ]
-       )
-   ]
- .

While the changes may not seem that dramatic, I find the usage much more idiomatic and obvious difference for the consumer is that a pipeline returns a stream and here we'd get a "read to use" dataset. A small difference maybe, but still an improvement IMO.