Closed pietervaneverdingen closed 7 months ago
The scope of this issue will be enlarged to also include "Add support for YARRRML import format".
And the bounty sum for this issue will be increased with € 500 to € 2500
Hi @pietervaneverdingen I would be interested to look into implementing this feature, but there is a blocking issue with how LDWizard handles "column refinements"
Column refinements are small functions defined by a LDWizard owner that enable the users to add additional transformation to the content of a cell (e.g. add a prefix to create a URI from a plain string). But they have been implemented to be applied before the RML transformation. Which means the infos about "columns refinements" are not present in the RML mappings
So for a transformation with columns refinements if you provide your old CSV + the RML that was previously generated, we will not be able to figure out what "column refinements" have been applied. Because the information is just not there
ldwizard_config.ts
file similarly to how we do with column refinements at the moment Then instead of running the column refinement ourselves we would just add the custom function to the RML, and RocketRML will take care of running the custom function for us (less code to maintain in LDWizard 🍾)
Imo that is the most logical and standard-compliant solution. But it is a major change to the whole LDWizard mechanism, and it might be a bit of a pandora box... It should be relatively easy (99% of the job already done by RocketRML), but not sure how much work it will need to get to a better state than the existing solution (will the RocketRML custom functions will cover all use-cases covered by the Column refinement?)
In my opinion if you want to be serious about using RML as shareable mapping format, then it needs to be fully supported, not partially. So the LDWizard should work directly with RML custom functions instead of having their own column refinement system in parallel
What are your thoughts on this? Are there specific needs that make the current column refinement system required, and prevent us to use RML custom functions?
I looked a bit more into it and have some amends to do to my previous message!
Considering those 2 points, continuing to use the current columns refinements makes more sense as it makes the system more reusable, even if it is not 100% following the RML specifications
I will look into adding import and YARRRML support when I have some time then :)
Hi @vemonet, many thanks for your proposal and additional refinement.
It is best to discuss your ideas with LDWizard Gatekeeper @mightymax first to decide on the best-possible scenario.
And then it is a good idea to schedule an online LDWizard session with the RML experts in our LDWizard working group on the short term, to finalize the preferred scenario that we all have the same understanding of what can be developed within what timeframe and how it can fit with the other LDWizard development activities.
Then we can assign the bounty to you and finish the development activities with all the relevant stakeholders involved.
Thanks @pietervaneverdingen, the instructions and features to implement were clear, so I went ahead and implemented them already:
The UI impact is minimal, I just added a "mapping file upload" button under the CSV upload button at the first step (which is optional, users can still just upload a CSV alone)
Importing is working well for the different mappings I tried it with
There is just one small issue: when the option "Row number" is chosen for the Key Column, then generating the YARRRML mappings fails. Because we use blank nodes for the SubjectMap, but YARRRML expect we use an URI.
This problem has been mentioned in this issue: https://github.com/pldn/LDWizard/issues/144 (where I answered with more details about the problem) I think there will need to be a small discussion between the LDWizard stakeholders to decide which way to take for row numbers (do we continue to use blank nodes? or do we do the effort of adding row number support to RocketRML to be able to use URIs as SubjectMap when the key column is the row number?)
I missed last week session, but I can join the next one to present the implementation! Discuss if it works as expected for you, and what you would like me to change
You can find the code in my fork here: https://github.com/vemonet/LDWizard/tree/add-import-and-yarrrml
Try it with:
git clone https://github.com/vemonet/LDWizard -b add-import-and-yarrrml
cd LDWizard
yarn
yarn dev
The session of last week was canceled, since the release including the SHACL support code was not ready yet. We can organize an online session with the RML experts to discuss in more detail your implementation and the issue you describe above. And before that we can ask @mightymax , @EnnoMeijers and others to have a look at your implementation.
A bit more information about the implementation, for those who might try it:
yarrrml-parser
npm package, and we do the import from the RML mappingsHi @vemonet
We recently did an update to a version 3 and have added some new features to LDWizard.
Quite a lot of changes have been made in the RML script, and in the navigation buttons during the SHACL integration #59. I believe these aforementioned changes might have some impact on your forked repository, most of the other features should not however. Please let me know if you run into any problems when rebasing.
Thanks for the notification @philipperenzen, I rebased in a new branch: https://github.com/vemonet/LDWizard/tree/add-import-and-yarrrml-rebased
That also reminded me I took the opportunity to also fix some small issues alongside adding import:
CONFIGURING.md
(https://github.com/pldn/LDWizard/issues/148)runtimeConfig.ts
so that it presents most parameters available to the user (type annotation and autocomplete is nice, but for this kind of config the best from a dev point of view is to have a complete example to start from)I can present it tomorrow during the LDWizard call if you want
Hi @vemonet, thanks for the update! Daniel is also available tomorrow to demo his SHACL-support functionality. I suggest that Daniel demos the functionality that he has built first and after that you can then present and demo your work.
Perfect for me, I was hoping to discover the new SHACL functionality!
User story "Pick up where somebody left of"
Background
Acceptance criteria
A bounty has been placed on this issue by PLDN for € 2000
Click here to learn more if you're interested in claiming this bounty by resolving this issue.