pldn / LDWizard

🧙 LDWizard: A generic framework for simplifying the creation of linked data. Supported by the PLDN community.
European Union Public License 1.2
13 stars 7 forks source link

Add support for RML import format #60

Closed pietervaneverdingen closed 7 months ago

pietervaneverdingen commented 2 years ago

User story "Pick up where somebody left of"

Background

Acceptance criteria


A bounty has been placed on this issue by PLDN for € 2000

Click here to learn more if you're interested in claiming this bounty by resolving this issue.

pietervaneverdingen commented 1 year ago

The scope of this issue will be enlarged to also include "Add support for YARRRML import format".

And the bounty sum for this issue will be increased with € 500 to € 2500

vemonet commented 1 year ago

Hi @pietervaneverdingen I would be interested to look into implementing this feature, but there is a blocking issue with how LDWizard handles "column refinements"

Problem

Column refinements are small functions defined by a LDWizard owner that enable the users to add additional transformation to the content of a cell (e.g. add a prefix to create a URI from a plain string). But they have been implemented to be applied before the RML transformation. Which means the infos about "columns refinements" are not present in the RML mappings

So for a transformation with columns refinements if you provide your old CSV + the RML that was previously generated, we will not be able to figure out what "column refinements" have been applied. Because the information is just not there

Solutions?

  1. The ideal solution would be to migrate from the old "column refinements" system to use custom RML functions, RocketRML makes it easy to define them, they could be defined by each LD wizard owner in the ldwizard_config.ts file similarly to how we do with column refinements at the moment

Then instead of running the column refinement ourselves we would just add the custom function to the RML, and RocketRML will take care of running the custom function for us (less code to maintain in LDWizard 🍾)

Imo that is the most logical and standard-compliant solution. But it is a major change to the whole LDWizard mechanism, and it might be a bit of a pandora box... It should be relatively easy (99% of the job already done by RocketRML), but not sure how much work it will need to get to a better state than the existing solution (will the RocketRML custom functions will cover all use-cases covered by the Column refinement?)

  1. The quick solution would be to implement the import RML feature and ignore columns refinements. But then the feature will be incomplete (and completely broken for all use-case with column refinements).

Conclusion

In my opinion if you want to be serious about using RML as shareable mapping format, then it needs to be fully supported, not partially. So the LDWizard should work directly with RML custom functions instead of having their own column refinement system in parallel

What are your thoughts on this? Are there specific needs that make the current column refinement system required, and prevent us to use RML custom functions?

vemonet commented 1 year ago

I looked a bit more into it and have some amends to do to my previous message!

  1. I forgot that the LDWizard provide the CSV after applying Column Refinments at the last step, so if the user reuse the RML mappings + the CSV after refinements, then the mappings can be imported without issue
  2. RML mappings with custom functions will only work on a RML engine that have implemented those custom functions. So only the RocketRML engine of the LDWizard used to do the mappings will be able to run the RDF generation (apart if you reimplement those custom functions for other RML engines manually)

Considering those 2 points, continuing to use the current columns refinements makes more sense as it makes the system more reusable, even if it is not 100% following the RML specifications

I will look into adding import and YARRRML support when I have some time then :)

pietervaneverdingen commented 1 year ago

Hi @vemonet, many thanks for your proposal and additional refinement.

It is best to discuss your ideas with LDWizard Gatekeeper @mightymax first to decide on the best-possible scenario.

And then it is a good idea to schedule an online LDWizard session with the RML experts in our LDWizard working group on the short term, to finalize the preferred scenario that we all have the same understanding of what can be developed within what timeframe and how it can fit with the other LDWizard development activities.

Then we can assign the bounty to you and finish the development activities with all the relevant stakeholders involved.

vemonet commented 1 year ago

Thanks @pietervaneverdingen, the instructions and features to implement were clear, so I went ahead and implemented them already:

The UI impact is minimal, I just added a "mapping file upload" button under the CSV upload button at the first step (which is optional, users can still just upload a CSV alone)

Importing is working well for the different mappings I tried it with

There is just one small issue: when the option "Row number" is chosen for the Key Column, then generating the YARRRML mappings fails. Because we use blank nodes for the SubjectMap, but YARRRML expect we use an URI.

This problem has been mentioned in this issue: https://github.com/pldn/LDWizard/issues/144 (where I answered with more details about the problem) I think there will need to be a small discussion between the LDWizard stakeholders to decide which way to take for row numbers (do we continue to use blank nodes? or do we do the effort of adding row number support to RocketRML to be able to use URIs as SubjectMap when the key column is the row number?)

I missed last week session, but I can join the next one to present the implementation! Discuss if it works as expected for you, and what you would like me to change

You can find the code in my fork here: https://github.com/vemonet/LDWizard/tree/add-import-and-yarrrml

Try it with:

git clone https://github.com/vemonet/LDWizard -b add-import-and-yarrrml
cd LDWizard
yarn
yarn dev
pietervaneverdingen commented 1 year ago

The session of last week was canceled, since the release including the SHACL support code was not ready yet. We can organize an online session with the RML experts to discuss in more detail your implementation and the issue you describe above. And before that we can ask @mightymax , @EnnoMeijers and others to have a look at your implementation.

vemonet commented 12 months ago

A bit more information about the implementation, for those who might try it:

philipperenzen commented 11 months ago

Hi @vemonet

We recently did an update to a version 3 and have added some new features to LDWizard.

Quite a lot of changes have been made in the RML script, and in the navigation buttons during the SHACL integration #59. I believe these aforementioned changes might have some impact on your forked repository, most of the other features should not however. Please let me know if you run into any problems when rebasing.

vemonet commented 11 months ago

Thanks for the notification @philipperenzen, I rebased in a new branch: https://github.com/vemonet/LDWizard/tree/add-import-and-yarrrml-rebased

That also reminded me I took the opportunity to also fix some small issues alongside adding import:

I can present it tomorrow during the LDWizard call if you want

pietervaneverdingen commented 11 months ago

Hi @vemonet, thanks for the update! Daniel is also available tomorrow to demo his SHACL-support functionality. I suggest that Daniel demos the functionality that he has built first and after that you can then present and demo your work.

vemonet commented 11 months ago

Perfect for me, I was hoping to discover the new SHACL functionality!