sparna-git / structured-data-extractor

A framework for extracting RDFa, JSON-LD, Microdata and text content from webpages
GNU Lesser General Public License v3.0
3 stars 1 forks source link

Does sparna-git generate Sparnatural compatible RDF files? #2

Closed marimeireles closed 5 months ago

marimeireles commented 5 months ago

Hi sparna folks! Thank you for your work in the ontology ecosystem. I love Sparnatural and would like to use it with my project : ) Currently my original data is sitting as a JSON in a server. I've successfully used a library called omero-rdf to translate the data to RDF. Though the translation is successful and I can query the RDF files using ontologies I find that these files are not fit for Sparnatural as they don't have well defined relationships. I tried using sparna-git, but it seems to me like it produces the same results as omero-rdf, a very large file that lists every single triple that have interactions with one another rather than a higher level file with cleaner interactions.

I'm wondering if:

a) the sparna-git library offers a functionality to make it more manageable for humans to classify this data in ways that Sparnatural would display correctly, or even if sparna-git does this by itself? b) in case it's not, do you have any advice on how to go about it? My data is very large, around millions of RDF lines.

I've wrote a little script to interact with the code via LLMs and got a little out of it, but I don't think the information is reliable, so I've decided to ask here. If you're interested I could possibly use the script to generate some more docs for the project.

Best and thank you,

tfrancart commented 5 months ago

Hello

sparna-git is our organisation identifier on Github. I am assuming that when you refer to sparna-git you actually mean the structured-data-extractor project.

This structured-data-extractor project is simply a component to extract RDF data inserted in web pages, in the form of RDFa, Microdata or JSON-LD snippets. Nothing more, nothing less. It is not a JSON-to-RDF converter. It has no relationship with Sparnatural.

I suspect that what you want is to analyze the structure of the RDF triples you got, and derive automatically a structure that can serve as a config for Sparnatural. If this is the case, what you are looking for is the SHACL generator : https://github.com/sparna-git/shacl-play/wiki/Run-SHACL-Play-App-from-command-line#the-generate-command The output of this SHACL generator can be used as a config file for Sparnatural, although not very appealing.

marimeireles commented 5 months ago

sparna-git is our organisation identifier on Github. I am assuming that when you refer to sparna-git you actually mean the structured-data-extractor project.

Yes, I do, sorry about that!

Thank you for the explanation that's actually super helpful. I thought RDFa and TTL were the same because the file types are interchangeable but I see that they serve completely different purposes.

Thanks for the pointer on SHACL, I'll take a look.

Best!