pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
210 stars 23 forks source link

[Discussion] Keep IRI as ns+suffix? #67

Closed MattesWhite closed 7 months ago

MattesWhite commented 4 years ago

Topic

This issue's purpose is to discuss if it is beneficial to keep the current implementation of IRI. This discussion arose from #55.

Explanation

sophia's implementation of IRIs contains two elements namespace ns and an optional suffix. This means that an IRI is either represented as a whole in the namespace field or as namespace and suffix like a CURIE. This is different to other RDF libraries like rio where IRIs are always represented by a single string. The question is:

Is it beneficial to keep the current implementation of IRIs with separated namespace and suffix?

Discussion

Pro - less memory consumption in Graphs

The current implementation is beneficial when storing terms in, for example, a Graph with reference(-counted) TermData where namespaces must only kept once in memory while the whole string solution would require to copy namepaces over and over again. This reduces the overall memory consumption of sophia.

Pro - cheap Namespace::get()

With the current implementation it is easy and cheap to create a Namespace and get() suffixed IRIs from it.

Con - consume more stack-memory

On the other hand keeping space for an optional suffix means that sophia's IRIs take more stack-memory (nearly twice as much) as when the IRI would be represented in a single place. This makes it costly to create short lived references to terms like RefTerm.

Con - prefixes are part of syntax

The usage of prefixes/CURIEs while used in nearly all RDF serialization is part of this formats and not of actual RDF itself.

Con - resolving kills suffixes

When a relative IRI is resolved it's suffix gets lost in the process anyway.

Contribution

I invite you to join the discussion. Best tag your answers with either ### Pro - ..., ### Con - ... or ### Conclusion - ... and answer to particular points with ### Regarding - Pro/Con - ..., so it's easier to keep track of pros, cons and opinions.

Tpt commented 4 years ago

My 2 cents:

Regarding - Pro - less memory consumption in Graphs

There is one tricky thing about using prefixes in Graphs. There is no unique prefix/suffix from a given IRI. For example, if we consider the IRI http://example.com/foo#barbaz if is perfectly valid in Turtle to define @prefix bar: <http://example.com/foo#bar> and @prefix foo: <http://example.com/foo#> and then use bar:baz or foo:barbaz for the same IRI. Because of that Graphs have to apply prefixes normalization, a process that has a cost. So, I am not sure if handling prefixes in Graphs does actually bring something compared to only storing absolute IRIs and use delta compression or tries to save space.

pchampin commented 4 years ago

@Tpt

There is no unique prefix/suffix from a given IRI

That's right. If one wants to use ns/suffix split to optimize memory consumption, then a form a normalization is required. But that's not a requirement in sophia, this split was introduced as an opportunistic optimization: whenever we construct IRIs having the same prefix (either by parsing them from Curies, or building them from a Namespace object), we do not have to duplicate the ns data.

only storing absolute IRIs and use delta compression

But then you have to decompress the data whenever you want to return a reference to it... That's ok for storage or transfer, but not for direct use.

pchampin commented 7 months ago

Closing this issue, as the big refactoring of 0.8 makes it moot. In the new versions, most IRIs are stored in one string (except for the special IRIs produced by namespaces).