ozekik / lightrdf

A fast and lightweight Python RDF parser which wraps bindings to Rust's Rio using PyO3
Apache License 2.0
28 stars 2 forks source link

Serialize RDF #10

Closed DylanVanAssche closed 11 months ago

DylanVanAssche commented 2 years ago

I was looking for a replacement for RDFLib for just parsing, do some BGP searching and write the new triples back. It seems that LightRDF can handle parsing and BGP searching, but not serializing the RDF triples again to a file.

Are there any plans for this?

dymil commented 1 year ago

It seems like it wouldn't be too hard to combine this with RDFLib, i.e., grab the triples from LightRDF and then load them into an RDFLib store (as N-Triples syntax) and serialize from there. I'll play with it and upload a doc PR maybe. I think that'd still be reasonably fast just because RDFLib is so slow at parsing – IIRC, several hours vs. several minutes once we're in the hundreds of millions of triples.

EDIT: Yeah, it seems like just parsing it into oxrdflib is pretty fast, given enough RAM.

g = Graph(store="Oxigraph")
g.parse(data = '\n'.join([f'{x[0]} {x[1]} {x[2]} .' for x in triples]), format='nt')

With 164K triples, this took 9.5s to run. Using the default Memory store, it took 3m41s, which does present unacceptable scaling IMO.

ozekik commented 11 months ago

Thank you for your suggestions. I intend to keep lightrdf focused on RDF parsing without complexity. For serialization, I recommend oxrdflib as @dymil mentioned or pyoxigraph which is virtually the same.