rvesse / lubm-uba

Parallelized version of the Lehigh University Benchmark (LUBM) Data Generator
29 stars 7 forks source link

N-Triples output produces relative IRIs #2

Open LorenzBuehmann opened 7 years ago

LorenzBuehmann commented 7 years ago

The N-Triples writer produces two triples per university which contain relative IRIs, but according to the W3C recommendation this is not allowed - at least not in RDF 1.1:

<> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Ontology> .
<> <http://www.w3.org/2002/07/owl#imports> <http://swat.cse.lehigh.edu/onto/univ-bench.owl> .

This also leads to errors when using the Jena parser, e.g. with

riot --check ...

it outputs

12:07:36 ERROR riot                 :: [line: 1, col: 1 ] Relative IRI: 
12:07:36 ERROR riot                 :: [line: 2, col: 1 ] Relative IRI:
rvesse commented 7 years ago

You can specify --base to RIOT to work around this

These triples are produced by the original code, the code was rewritten to specifically output identical output to the original code (and there are scripts in the repo that test this)

This is not ideal but for correctness wrt to the original code these are essential

LorenzBuehmann commented 7 years ago

Ok, didn't know the reason, but indeed comparability with the original data makes sense.

Regarding RIOT, this doesn't change anything because of

--base=URI             Set the base URI (does not apply to N-triples and N-Quads)

which is quite obvious as it was never assumed to have relative URIs in N-Triples. The only drawback is that conversion with RIOT also fails, e.g. if I want to transform it to RDF/XML. But it's ok for me, I just transform the data with sed before.

Thanks for the fast support!

rvesse commented 5 years ago

Another possible workaround here is to parse the resulting data as Turtle with the base URI set since NTriples is a subset of Turtle