pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
214 stars 23 forks source link

Add Turtle Support #17

Closed JordanShurmer closed 8 months ago

JordanShurmer commented 4 years ago

This is a feature request. I'll gladly contribute the work when I get the time, but I figured I should go ahead and open the issue in case others want to contribute it. Or maybe there are existing solutions.

Background

This crate seems to be the most mature and most recently maintained RDF related crates. So, I'm hoping to use this in an LDP server I'm creating. However, turtle is a required format in LDP servers.

The Request

pchampin commented 4 years ago

This will soon be the case; I am in the process of integrating https://github.com/Tpt/rio, which will bring capabilities for parsing (and serializing) Turtle and TriG, and make N-Triples and N-Quads parsing much faster than they are now...

JordanShurmer commented 4 years ago

Awesome. I had already opened an issue for JSON-LD through rio as well :)

pchampin commented 4 years ago

Hi @JordanShurmer, commit f3ed2f8 introduced the Rio based parsers, bringing Turtle support (among other format). Let me know if this satisfies your need, and if so, feel free to close this issue (unless you want to wait for a release to do so...)

MattesWhite commented 4 years ago

What about a Turtle serializer? Are there any attemps? I see that rio has a TurtleFormatter but it seems to me far from beeing usable.

pchampin commented 4 years ago

@MattesWhite you are right, we should not close this issue without a Turtle serializer as well.

My plan was to use rio's formatters for that, but have not looked into it in details... However, if they are not satisfactory, I would be in favor of improving them rather than starting a different code base.

MattesWhite commented 4 years ago

To be honest, there are about 50 lines of code dedicated to the Turtle formatter with no signs of prefixes, abbreviated numerical literals, nested anonymous blank nodes and so on. Basically the formatter is not existent. Furthermore, rio itself lacks a Graph which is necessary at least to print nested blank nodes [Jena doc]. In addition rio itself is labeled as

RDF parsers library

So I think there won't be any progress in the near future.


In my opinion, the best solution is to write a serializer specifically for the sophia-core API. At least to provide a basic serializer that is not NTriples.

For me the lack of a practical serializer is a blocking issue for the adoption of Rust in the Semantic Web. Therefore, the Turtle serializer has the highest priority for me and I wouldn't mind contributing the first external crate to sophia's ecosystem (after we're done refactoring the error-handling).


@Tpt What do you have to say about this topic and my estimations?

Tpt commented 4 years ago

Indeed, the current Rio formatters are very basic. I wrote them in a few hours to have something working for Oxigraph. I did not put much effort into them.

One of my dream for Rio would be to add a second parser/formatter API additionally to the current triples/quads iterations that would encodes the syntax level structures like prefixes used, nested blank nodes, RDF lists... This would allow to convert between RDF representations while keeping as much structure as possible and allow Rio users to control how the content is rendered. This API would probably look like a stream of event like "open nested blank node", "add predicate" "add subject" "close predicate", "open list"... But I do not think I'll work on it anytime soon. I am trying to push a first working version of Oxigraph at the moment.

So, if you want to have now a Turtle formatter highly integrated with what Sophia is currently offering I believe that writing a Turtle formatter inside of Sophia makes a lot of sense.

pchampin commented 4 years ago

Thanks @Tpt for this answer. I guess we can give it a try inside Sophia first, and possibly port it to Rio if you like it.

I'm not sure we need something as sophisticated as the events you suggest. The way I see it, a Turtle serializer would be initialized with a base and a set of prefixes, and would then receive a stream of triples. It would serializer the triples in order, doing its best to keep it compact (using base, prefixes, and subject/predicate repetitions).

The trick would then be to generate a serializer-friendly triple-stream . This part could be implemented by extension-traits around Graph/Dataset (and as such, could not easily be integrated in Rio).

MattesWhite commented 4 years ago

I agree with the streaming Turtle serializer. Indeed, this would be for me the MVP required.

For generating better Turtle I would suggest providing two serializers. The first for serializing streams and the second borrows a Graph (Dataset for TriG) completly, so it can analyze triples thoroughly. For example, only make a nested blank node if it is only mentioned as object in one triple or identifying where a list can be put.

Regarding putting the serializer into sophia. The current plan is to split sophia into a core API and crates implementing the provided interface (see #23 , #26 ). In addition, the implementation of a Turtle serializer (and related formats) will probably require several days (weeks, months...) of investment. Therefore, I'm not that into the idea doing all the work as one big (or several minor) PR. Accordingly, I'll setup my own project following the philosophy of #23 . Don't worry, I will continue to contribute to sophia. Especially, developing the serializer API.

pchampin commented 4 years ago

For generating better Turtle I would suggest providing two serializers. [...]

Sounds like a good idea.

In addition, the implementation of a Turtle serializer (and related formats) will probably require several days (weeks, months...) of investment.

Probably. So indeed, my priority is now on #26.

Don't worry, I will continue to contribute to sophia

thanks :-)

KonradHoeffner commented 8 months ago

I think this issue can be closed, there is now sophia::turtle::parser::turtle and sophia::turtle::serializer::turtle.

pchampin commented 8 months ago

I think this issue can be closed, there is now sophia::turtle::parser::turtle and sophia::turtle::serializer::turtle.

Indeed :)