pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
210 stars 23 forks source link

Serialize isomorphic graphs to identical Turtle #165

Closed GordianDziwis closed 2 months ago

GordianDziwis commented 3 months ago

I have a project where I export diagrams to RDF. How can I influence the order of statements when I serialize a sophia graph to turtle?

pchampin commented 3 months ago

@GordianDziwis thanks for your interest in Sophia.

Short answer is: you can not guarantee that the Turtle you produce is 100% identical to the source turtle.

Longer answer

What you can do to preserve order

Limitations

The contract of [Serializer::triple_source] does not guarantee, in general, that the order of the triples in the source will be preserved, in the serialization.

More generally, there are many issues, beyond triple order, that make it practically impossible to preserve the Turtle representation between parsing and serializing. Take, in particular, prefix declaration:

Another issue would be heterogenity with "prettiness". Consider the following:

@prefix: <https://example.org/>
:s :p1 [ :a :b ].
:s :p2 _:b.
_:b :c :d.

This turtle is a mix of pretty and non-pretty. There is no way to serialize it back as is with Sophia.

GordianDziwis commented 2 months ago

Thank you for your detailed answer!

I do not care so much about the order of the triples or that TurtleIn == TurtleOut for TurtleIn => Graph => TurtleOut, but that the same graph always results in the same Turtle document.

The first use case is version control, I build a graph programmatically and have the serialized graph in git.

And as you said a graph is a set, but the triple source for a serializer is ordered and the order can influence the serialization.

For me, it would be enough if the Turtle serializer with pretty would produce the same Turtle for the same ordered triples from a triple source (is this already the case?). For any order would be even nicer.

pchampin commented 2 months ago

Thank you for your detailed answer!

The first use case is version control, I build a graph programmatically and have the serialized graph in git.

got it

And as you said a graph is a set, but the triple source for a serializer is ordered and the order can influence the serialization.

indeed

For me, it would be enough if the Turtle serializer with pretty would produce the same Turtle for the same ordered triples from a triple source (is this already the case?).

I don't know, off the top of my head, if that's already the case, but even if it is currently, I would not rely on it, because I would not consider it to be a design goal of the Turtle serializer.

What you want is a canonical representation of the RDF graph. The good news is that there is a brand new standard for that (https://www.w3.org/TR/rdf-canon/) and that it is implemented in Sophia (https://docs.rs/sophia_c14n/latest/sophia_c14n/rdfc10/index.html). The no-so-good news is that this canonical representation is based on N-Quads (N-Triples if you have a single graph), so it is much more verbose than "pretty" Turtle. But note that N-Triples is a subset of Turtle, so you can still feed it your your Turtle parser and it will work.

GordianDziwis commented 2 months ago

Yeah this is what I want, thanks!