Closed GordianDziwis closed 2 months ago
@GordianDziwis thanks for your interest in Sophia.
Short answer is: you can not guarantee that the Turtle you produce is 100% identical to the source turtle.
Graph
implementations (because inherently RDF graphs are sets, hence not order);Graph
implementation preserving order is Vec<T> where T: Triple
;serialize_triples
method which expects a triple source, and is more likely (see below) to preserve that order in the serialization.The contract of [Serializer::triple_source
] does not guarantee, in general, that the order of the triples in the source will be preserved, in the serialization.
pretty
option is on
, because it prioritizes conciseness and, well, prettyness;pretty
option will, currently, preserve order, but I can't guarantee that future implementation will do (see rationale below).More generally, there are many issues, beyond triple order, that make it practically impossible to preserve the Turtle representation between parsing and serializing. Take, in particular, prefix declaration:
@prefix : <https://example.org/1/>.
:s :p :o.
@prefix : <https://example.org/2/>.
:s :p :o.
and the Sophia turtle serializer will never generate something like that.
Another issue would be heterogenity with "prettiness". Consider the following:
@prefix: <https://example.org/>
:s :p1 [ :a :b ].
:s :p2 _:b.
_:b :c :d.
This turtle is a mix of pretty and non-pretty. There is no way to serialize it back as is with Sophia.
Thank you for your detailed answer!
I do not care so much about the order of the triples or that TurtleIn == TurtleOut
for TurtleIn => Graph => TurtleOut
, but that the same graph always results in the same Turtle document.
The first use case is version control, I build a graph programmatically and have the serialized graph in git.
And as you said a graph is a set, but the triple source for a serializer is ordered and the order can influence the serialization.
For me, it would be enough if the Turtle serializer with pretty
would produce the same Turtle for the same ordered triples from a triple source (is this already the case?). For any order would be even nicer.
Thank you for your detailed answer!
The first use case is version control, I build a graph programmatically and have the serialized graph in git.
got it
And as you said a graph is a set, but the triple source for a serializer is ordered and the order can influence the serialization.
indeed
For me, it would be enough if the Turtle serializer with
pretty
would produce the same Turtle for the same ordered triples from a triple source (is this already the case?).
I don't know, off the top of my head, if that's already the case, but even if it is currently, I would not rely on it, because I would not consider it to be a design goal of the Turtle serializer.
What you want is a canonical representation of the RDF graph. The good news is that there is a brand new standard for that (https://www.w3.org/TR/rdf-canon/) and that it is implemented in Sophia (https://docs.rs/sophia_c14n/latest/sophia_c14n/rdfc10/index.html). The no-so-good news is that this canonical representation is based on N-Quads (N-Triples if you have a single graph), so it is much more verbose than "pretty" Turtle. But note that N-Triples is a subset of Turtle, so you can still feed it your your Turtle parser and it will work.
Yeah this is what I want, thanks!
I have a project where I export diagrams to RDF. How can I influence the order of statements when I serialize a sophia graph to turtle?