lucmoreau / ProvToolbox

Java toolkit to create and convert W3C PROV data model representations, and build provenance-enabled applications in a variety of programming languages (java, python, typescript, javascript)
Other
74 stars 42 forks source link

turtle parsing of input stream + relative uris #122

Closed dtm closed 9 years ago

dtm commented 9 years ago
49% curl http://eprints.soton.ac.uk/375233/7/provenance.ttl | provconvert -infile - -informat ttl -outformat provn -outfile -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 48319  100 48319    0     0   691k      0 --:--:-- --:--:-- --:--:--  693k
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" org.openprovenance.prov.interop.InteropException: org.openrdf.rio.RDFParseException: Not a valid (absolute) URI: /starting-points.png [line 44]
    at org.openprovenance.prov.interop.InteropFramework.readDocument(InteropFramework.java:604)
    at org.openprovenance.prov.interop.InteropFramework.readDocument(InteropFramework.java:533)
    at org.openprovenance.prov.interop.InteropFramework.doReadDocument(InteropFramework.java:790)
    at org.openprovenance.prov.interop.InteropFramework.run(InteropFramework.java:841)
    at org.openprovenance.prov.interop.CommandLineArguments.main(CommandLineArguments.java:227)
Caused by: org.openrdf.rio.RDFParseException: Not a valid (absolute) URI: /starting-points.png [line 44]
    at org.openrdf.rio.helpers.RDFParserBase.reportFatalError(RDFParserBase.java:622)
    at org.openrdf.rio.turtle.TurtleParser.reportFatalError(TurtleParser.java:1114)
    at org.openrdf.rio.helpers.RDFParserBase.createURI(RDFParserBase.java:340)
    at org.openrdf.rio.helpers.RDFParserBase.resolveURI(RDFParserBase.java:327)
    at org.openrdf.rio.turtle.TurtleParser.parseURI(TurtleParser.java:855)
    at org.openrdf.rio.turtle.TurtleParser.parseValue(TurtleParser.java:525)
    at org.openrdf.rio.turtle.TurtleParser.parseObject(TurtleParser.java:413)
    at org.openrdf.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:339)
    at org.openrdf.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:315)
    at org.openrdf.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:301)
    at org.openrdf.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:208)
    at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:186)
    at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:131)
    at org.openprovenance.prov.rdf.Utility.parseRDF(Utility.java:67)
    at org.openprovenance.prov.rdf.Utility.parseRDF(Utility.java:58)
    at org.openprovenance.prov.interop.InteropFramework.readDocument(InteropFramework.java:584)
    ... 4 more
Caused by: java.lang.IllegalArgumentException: Not a valid (absolute) URI: /starting-points.png
    at org.openrdf.model.impl.URIImpl.setURIString(URIImpl.java:68)
    at org.openrdf.model.impl.URIImpl.<init>(URIImpl.java:57)
    at org.openrdf.model.impl.ValueFactoryImpl.createURI(ValueFactoryImpl.java:38)
    at org.openrdf.rio.helpers.RDFParserBase.createURI(RDFParserBase.java:337)
    ... 17 more

But this works fine:

50% curl http://eprints.soton.ac.uk/375233/7/provenance.ttl > provenance.ttl
51% provconvert -infile provenance.ttl -informat ttl -outformat provn -outfile -

Line 44:

<http://openprovenance.org/include#20892220-a071-4ef3-a799-3056447ec8a2-1>schema:contentLocation <starting-points.png> . 

I would guess its because we dont have a default base uri defined when we're parsing from a stream as the file based conversion output has pre_88:starting-points.png where pre_88 starts with file:.

dtm commented 9 years ago

As a side point pre_88 in the output is:

prefix pre_88 <file:/home/...>

I think the file uri should be file:///home/....

lucmoreau commented 9 years ago

This is a perfectly valid rdf file, but does it allow for prov inter-operability? The other prov representations don't have this notion of uri relative to a base uri.

So, how do we handle this? Should we have a base uri parameter for provconvert?

lucmoreau commented 9 years ago

I have implemented a fix, setting a base uri to file://stdin/.

The example now parses in this specific case. However, as said above, we have not solved the interoperability issue here. Should we explicitly disallow relative uris?

As far as the file uri is concerned, it's generated by java.io library. There is very little I can do here.

lucmoreau commented 9 years ago

I added a example file in prov-rdf/src/test/resources/examples/relative-uri.ttl.

If we read the file with

cat prov-rdf/src/test/resources/examples/relative-uri.ttl | provconvert -infile - -informat ttl -outfile - -outformat provn

Then, we get a file://stdin/ as a "base uri".

document
prefix bnode <http://openprovenance.org/provtoolbox/bnode/>
prefix pre_0 <file://stdin/>
prefix ex <http://example.com/>
prefix owl <http://www.w3.org/2002/07/owl#>
prefix rdf <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs <http://www.w3.org/2000/01/rdf-schema#>
activity(ex:experiment,-,-)
entity(ex:inconsistentResult)
wasEndedBy(ex:experiment,ex:inconsistentResult,-,-)
wasEndedBy(ex:experiment,ex:inconsistentResult,-,2011-07-16T01:52:02Z,[prov:location = 'pre_0:scienceLab_003'])
endDocument

If we read the file with

 provconvert -infile prov-rdf/src/test/resources/examples/relative-uri.ttl -outfile - -outformat provn 

then, we get a base uri file:/home/me/workspace/ProvToolbox/prov-rdf/src/test/resources/examples/

document
prefix bnode <http://openprovenance.org/provtoolbox/bnode/>
prefix pre_0 <file:/home/me/workspace/ProvToolbox/prov-rdf/src/test/resources/examples/>
prefix ex <http://example.com/>
prefix owl <http://www.w3.org/2002/07/owl#>
prefix rdf <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs <http://www.w3.org/2000/01/rdf-schema#>
activity(ex:experiment,-,-)
entity(ex:inconsistentResult)
wasEndedBy(ex:experiment,ex:inconsistentResult,-,-)
wasEndedBy(ex:experiment,ex:inconsistentResult,-,2011-07-16T01:52:02Z,[prov:location = 'pre_0:scienceLab_003'])
endDocument