Closed MattesWhite closed 7 months ago
My 2 cents:
Graphs
There is one tricky thing about using prefixes in Graph
s. There is no unique prefix/suffix from a given IRI. For example, if we consider the IRI http://example.com/foo#barbaz
if is perfectly valid in Turtle to define @prefix bar: <http://example.com/foo#bar>
and @prefix foo: <http://example.com/foo#>
and then use bar:baz
or foo:barbaz
for the same IRI. Because of that Graph
s have to apply prefixes normalization, a process that has a cost. So, I am not sure if handling prefixes in Graph
s does actually bring something compared to only storing absolute IRIs and use delta compression or tries to save space.
@Tpt
There is no unique prefix/suffix from a given IRI
That's right. If one wants to use ns/suffix split to optimize memory consumption, then a form a normalization is required. But that's not a requirement in sophia, this split was introduced as an opportunistic optimization: whenever we construct IRIs having the same prefix (either by parsing them from Curies, or building them from a Namespace
object), we do not have to duplicate the ns data.
only storing absolute IRIs and use delta compression
But then you have to decompress the data whenever you want to return a reference to it... That's ok for storage or transfer, but not for direct use.
Closing this issue, as the big refactoring of 0.8 makes it moot. In the new versions, most IRIs are stored in one string (except for the special IRIs produced by namespaces).
Topic
This issue's purpose is to discuss if it is beneficial to keep the current implementation of IRI. This discussion arose from #55.
Explanation
sophia
's implementation of IRIs contains two elements namespacens
and an optionalsuffix
. This means that an IRI is either represented as a whole in the namespace field or as namespace and suffix like a CURIE. This is different to other RDF libraries likerio
where IRIs are always represented by a single string. The question is:Is it beneficial to keep the current implementation of IRIs with separated namespace and suffix?
Discussion
Pro - less memory consumption in
Graph
sThe current implementation is beneficial when storing terms in, for example, a
Graph
with reference(-counted)TermData
where namespaces must only kept once in memory while the whole string solution would require to copy namepaces over and over again. This reduces the overall memory consumption ofsophia
.Pro - cheap
Namespace::get()
With the current implementation it is easy and cheap to create a
Namespace
andget()
suffixed IRIs from it.Con - consume more stack-memory
On the other hand keeping space for an optional
suffix
means thatsophia
's IRIs take more stack-memory (nearly twice as much) as when the IRI would be represented in a single place. This makes it costly to create short lived references to terms likeRefTerm
.Con - prefixes are part of syntax
The usage of prefixes/CURIEs while used in nearly all RDF serialization is part of this formats and not of actual RDF itself.
Con - resolving kills suffixes
When a relative IRI is resolved it's suffix gets lost in the process anyway.
Contribution
I invite you to join the discussion. Best tag your answers with either
### Pro - ...
,### Con - ...
or### Conclusion - ...
and answer to particular points with### Regarding - Pro/Con - ...
, so it's easier to keep track of pros, cons and opinions.