ozekik / lightrdf

A fast and lightweight Python RDF parser which wraps bindings to Rust's Rio using PyO3
Apache License 2.0
28 stars 2 forks source link

Parse from String #1

Closed Lars-H closed 3 years ago

Lars-H commented 3 years ago

Hello,

I am interested in using your library for fast parsing from turtle to n-triples. However, as the current API only supports parsing from a file, I was wondering if it would be possible to extend the library to also parse string objects?

Thanks Lars

ozekik commented 3 years ago

Thank you for the comment!

I think it's possible and the library definitely should support that feature. I'll release the next version with it soon, while using tempfile or the like is needed for the moment.

Lars-H commented 3 years ago

Thanks, that sounds good!

ozekik commented 3 years ago

Now in v0.2.0 the library supports parsing file-like objects, so we can parse RDF from string as follows:

import io
import lightrdf

data = """<http://one.example/subject1> <http://one.example/predicate1> <http://one.example/object1> . # comments here
# or on a line by themselves
_:subject1 <http://an.example/predicate1> "object1" .
_:subject2 <http://an.example/predicate2> "object2" ."""

doc = lightrdf.RDFDocument(io.BytesIO(data.encode()), parser=lightrdf.turtle.PatternParser)

for triple in doc.search_triples("http://one.example/subject1", None, None):
    print(triple)

https://github.com/ozekik/lightrdf#tip-parse-from-string

Note that it requires byte stream (BytesIO) rather than text stream (StringIO). I think this is an internally consistent behavior and we can easily convert them each other (with more or less overhead though).