ozekik / lightrdf

A fast and lightweight Python RDF parser which wraps bindings to Rust's Rio using PyO3
Apache License 2.0
28 stars 2 forks source link

Rio libraries need updating to fix a very weird bug #14

Closed gouttegd closed 11 months ago

gouttegd commented 1 year ago

When using LightRDF in the Ontology Development Kit, we have come across a very strange bug where LightRDF would fail to parse RDF/XML files that seem completely valid.

Here is a file that LightRDF fails to parse: https://github.com/INCATools/ontology-development-kit/files/10042121/tdm-bad.txt

(Sorry for the size of the file, but I was unable to reduce the error case to a minimal demonstrating example.)

Trying to parse that file with LightRDF as follows:

import sys
from lightrdf import Parser
parser = Parser()
try:
    for triple in parser.parse("tdm-bad.xml"):
        pass
except Exception as e:
    print(e)
    sys.exit(1)

yields the following error: Unexpected EOF during reading Comment.

I have no idea where the bug exactly is. However, rebuilding LightRDF after updating the Rio dependencies (rio_api, rio_turtle, and rio_xml) in Cargo.toml to their latest version (0.8.3) seems enough to fix it.

ozekik commented 11 months ago

Thank you for reporting! We've released v0.4.0 and, based on my testing with your data, the issue appears to be resolved. (v0.3.2 should also work.) Sorry it took a while.