sparna-git / xls2rdf

Create RDF data from Excel spreadsheets - edit SKOS vocabularies, knowledge graph instances, SHACL constraints, OWL ontologies in Excel files. Available as HTTP service, upload form, command-line, or Java API.
https://xls2rdf.sparna.fr
GNU Lesser General Public License v3.0
17 stars 3 forks source link

How to run Xls2Rdf from the command line? #9

Closed aurelberra closed 4 years ago

aurelberra commented 4 years ago

Hello. For a project based on open source tools I have been using your xls2skos CLI tool (currently xls2skos-0.7.6-onejar.jar). I would like to switch to xls2rdf, but in the files downloaded from the releases as per the instructions on the wiki, I cannot find the expected xls2rdf-app-x.y.z-onejar.jar file. Have I misunderstood something? Thanks in advance for your help.

tfrancart commented 4 years ago

Hello

(au moins un qui suit !). Yes, I just needed to package and deploy a new release. This is done now, with a new 2.0 release. You should check out latest cool features such as URI lookup or changing subject column. Maybe you can share a pointer to your project ? I am always curious.

Don't forget to close the issue if appropriate.

Thanks

aurelberra commented 4 years ago

Great! Merci beaucoup.

But early adopters are also facing bugs… The SkosPostProcessor raises an exception:

Exception in thread "main" java.lang.ClassCastException: class org.eclipse.rdf4j.model.impl.SimpleLiteral cannot be cast to class org.eclipse.rdf4j.model.Resource (org.eclipse.rdf4j.model.impl.SimpleLiteral and org.eclipse.rdf4j.model.Resource are in unnamed module of loader 'app')
    at fr.sparna.rdf.xls2rdf.SkosPostProcessor.lambda$afterSheet$1(SkosPostProcessor.java:36)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at fr.sparna.rdf.xls2rdf.SkosPostProcessor.afterSheet(SkosPostProcessor.java:34)
    at fr.sparna.rdf.xls2rdf.Xls2RdfConverter.processSheet(Xls2RdfConverter.java:343)
    at fr.sparna.rdf.xls2rdf.Xls2RdfConverter.processWorkbook(Xls2RdfConverter.java:174)
    at fr.sparna.rdf.xls2rdf.Xls2RdfConverter.processInputStream(Xls2RdfConverter.java:145)
    at fr.sparna.rdf.xls2rdf.app.Convert.execute(Convert.java:108)
    at fr.sparna.rdf.xls2rdf.app.Main.run(Main.java:79)
    at fr.sparna.rdf.xls2rdf.app.Main.main(Main.java:86)

There was nothing out of the way in the file or in the conversion command: sudo java -jar xls2rdf-app-2.0-onejar.jar convert -i in.xlsx -o out_2.rdf -l fr. Can you see what happened?

tfrancart commented 4 years ago

You have a skos:broader that has a literal as a value, instead of URI. Try adding the option --noPostProcessings to deactivate SKOS post-processings and have a look at the output SKOS file to look for that bad skos:broader property.

tfrancart commented 4 years ago

Will improve the behavior of this : #10

aurelberra commented 4 years ago

I found a term (out of 1000+) in which the prefix was missing (though it was there a few hours ago, I will refrain from accusing colleagues who have access to the shared spreadsheet), but unfortunately I still get the same error with the corrected data. I checked all the terms several times. I also tried to remove non-ASCII characters from the URI, to no avail.

I see that xls2skos still happily converts the same spreadsheet. Is there any rule that changed between xls2skos and xls2rdf, and might explain that the newer one chokes on my data?

tfrancart commented 4 years ago
tfrancart commented 4 years ago

See https://github.com/sparna-git/xls2rdf/releases/tag/2.0.1

aurelberra commented 4 years ago

Many thanks for the checklist and the update. I have analysed the content of the broader/narrower columns again and can only find URIs. In the results of version 2.0.1, I have Found a skos:broadeer with Literal value warnings (you may have spotted the typo for "broader" already) for all my broader terms. I have URIs like "savoirs:histoire", "savoirs:pratiques_savantes", "savoirs:savoir-faire", "savoirs:Internet". The prefix is apparently not the problem, as I tried to remove it.

tfrancart commented 4 years ago

Please give the complete warning message. "savoirs:histoire" is not an (HTTP) URI, it looks like a plain string with a prefix not correctly interpreted. Have you checked your prefix declaration in the header ? Share your spreadsheet here if not confidential.

aurelberra commented 4 years ago

The first lines and columns of the spreadsheet look like follows:

A B C
ConceptScheme URI http://data.xxx.fr/thes/savoirs
PREFIX savoirs http://data.xxx.fr/thes/savoirs/

Though the data are not highly confidential and will be open as soon as possible, I'd rather not share them online at such an early stage of the project. I'm happy to share them with you privately another way, of course.

aurelberra commented 4 years ago

I forgot to add the whole warning:

10:03:20.547 2090 WARN f.s.rdf.xls2rdf.SkosPostProcessor - Found a skos:broadeer with Literal value : savoirs:histoire

tfrancart commented 4 years ago

Indeed, the prefix is not interpreted correctly, and "savoirs:histoire" remains a plain literal. Please double check your prefix declaration as well as the skos:broader column title row. Make sure you don't have extra whitespaces before or after "savoirs:", in the prefix declaration as well as in your values. You can send me the file at "thomas dot francart [at] sparna dot fr".

aurelberra commented 4 years ago

Thank you for solving this problem so quickly!

Before closing the issue, I leave a comment here to say that in the header skos:broader should not be used with a language suffix: a tag like @en forces the parsing of the cells as literal values. In my case skos:broader@fr(separator=",") had to be corrected into skos:broader(separator=",").