katsi / rml-generator

This is a prototype for generating a RML file by describing the source data and how it maps to a schema.
9 stars 2 forks source link

Generating RML #2

Open tobiasschweizer opened 9 months ago

tobiasschweizer commented 9 months ago

Hi @katsi,

I was searching for solutions to dynamically generate RML mappings instead of writing them manually and came across your repo. Was this intended as a mere PoC or do actually use this in practice?

I am asking because we have an RML pipeline and haven written the mappings ourselves so far. In the future, we would like to (somehow) generate them from crosswalk registries. However, the source format will look different than https://github.com/katsi/rml-generator/blob/main/prototype/mapping.ttl

I think I would also have used rdflib because of the flexibility. Have you considered using a SPARQL CONSTRUCT query?

Best,

Tobias

PS: I did an Erasmus term in Helsinki but it is quite a long time ago :-)

katsi commented 9 months ago

Hi Tobias,

I appreciate your interest in my project. It is meant as POC and at Inter IKEA Systems B.V. we have developed a version from it that works in production and drives all of our mappings to the Knowledge Graph.

You can also work with SPARQL Construct queries, which is an industry-standard. For me, the important thing is to go away from blank nodes and give data sources and individual data schema parts (e.g. a column in an Excel) identifiers (URI/IRI), so that we can attach more metadata to them and make them reusable.

We are currently working on making our solution open source, but it will take some time to clear all the details.

I am fluent in German, falls Du auch auf Deutsch schreiben willst.

Katariina


From: Tobias Schweizer @.> Sent: 11 January 2024 16:29 To: katsi/rml-generator @.> Cc: Katariina Kari (External) @.>; Mention @.> Subject: [katsi/rml-generator] Generating RML (Issue #2)

Hi @katsihttps://github.com/katsi,

I was searching for solutions to dynamically generate RML mappings instead of writing them manually and came across your repo. Was this intended as a mere PoC or do actually use this in practice?

I am asking because we have an RML pipeline and haven written the mappings ourselves so far. In the future, we would like to (somehow) generate them from crosswalk registries. However, the source format will look different than https://github.com/katsi/rml-generator/blob/main/prototype/mapping.ttl

I think I would also have used rdflib because of the flexibility. Have you considered using a SPARQL CONSTRUCT query?

Best,

Tobias

PS: I did an Erasmus term in Helsinki but it is quite a long time ago :-)

— Reply to this email directly, view it on GitHubhttps://github.com/katsi/rml-generator/issues/2, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGTSXHIQM4R7HD5PL4Y2WLYN7ZLLAVCNFSM6AAAAABBWS4TT2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA3TMOBRG43TKOA. You are receiving this because you were mentioned.Message ID: @.***>

tobiasschweizer commented 9 months ago

Guten Tag Katariina :-)

Yes, identifiers are crucial. We deal with a lot of data sources where we do not have unique elements that we could use in templates for subject maps or predicate object maps. We use blank nodes in rare cases where we model complex values as resources that needn't be identified.

I would be interested in having a look at your code once it is open source.

The registry I initially mentioned is actually a project with involvement of the CSC in Helsinki: https://faircore4eosc.eu/eosc-core-components/metadata-schema-and-crosswalk-registry-mscr

For now, I am aiming at some simple 1:1 correspondences for which predicate object maps can be easily generated. However, in our existing mappings that we wrote manually we have a lot of cases where pre-processing was necessary. I am interested in RML functions but they are handled differently by different engines. Still, RML functions might be very valuable, for example in cases where you have a full name as one property in one model and family name and given name in the target model etc.

Best,

Tobias