ozekik / lightrdf

A fast and lightweight Python RDF parser which wraps bindings to Rust's Rio using PyO3
Apache License 2.0
28 stars 2 forks source link

lightrdf.Error: error while parsing IRI '': No scheme found in an absolute IRI #6

Open vadyushkins opened 3 years ago

vadyushkins commented 3 years ago

Hi @ozekik!

Thank you for the awesome library! :clap:

Unfortunately, while using your library, I got the error :bug: mentioned in the title. :disappointed: But using rdflib I was not getting a similar error. :thinking:

Environment

Steps to reproduce.

  1. Download pathways archive.

    wget -q https://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/pathways.rdf.xz
  2. Unzip it using xz package.

    sudo apt install xz-utils
    unxz pathways.rdf.xz 
  3. Run count_triples_lightrdf_parser.py.

    python3 count_triples_lightrdf_parser.py pathways.rdf
  4. Error log.

    Traceback (most recent call last):
    File "count_triples_lightrdf_parser.py", line 8, in <module>
    for triple in parser.parse(sys.argv[1]):
    lightrdf.Error: error while parsing IRI '': No scheme found in an absolute IRI

Please tell me where I am wrong. Thank you :pray:

ozekik commented 3 years ago

I'm sorry for the late reply. Thank you for the very clear report!

A quick solution is to specify base_iri of parse() to some absolute URI (like https://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/pathways.rdf.xz):

import lightrdf
import sys

parser = lightrdf.Parser()

cnt = 0

for triple in parser.parse(sys.argv[1], base_iri="https://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/pathways.rdf.xz"):
    cnt += 1

print(cnt)

More specifically, the problem is <owl:Ontology rdf:about=""> in pathways.rdf, for rdf:about="" means "the URI of the document containing the ontology" (as stated in OWL1/2 specs and in general), but there is no definitive URI/IRI for downloaded local files.

RDFLib avoids this problem by using the local IRI of the file (file:///.../pathways.rdf) for base IRI by default. We may make lightrdf do the same thing, but before that I'd like to investigate if it is reasonable.

vadyushkins commented 3 years ago

Thanks for your reply @ozekik!

I think the best solution would be to put your example in the README, as the base IRI default setting might be more confusing in other situations.