pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
212 stars 23 forks source link

[discussion] Help make sophia a common RDF API for Rust #23

Open pchampin opened 4 years ago

pchampin commented 4 years ago

The design of Sophia emphasizes genericity. The goal was, from the start, to allow multiple implementations of the provided traits to coexist, not even necessarily inside the sophia crate itself.

The goal of this issue is to foster discussion on what is required to achieve this. Are there any design choices in Sophia's traits or underlying types which you find too opinionated or constraining? Are they too complex to be widely adopted?

Tpt commented 4 years ago

Are there any design choices in Sophia's traits or underlying types which you find too opinionated or constraining?

I would actually disagree with the goal of this question. I find a lot of Sophia traits not opinionated enough, having a lot of generic arguments making their use very cumbersome. Imho RDF libraries end users should not have to write any for<'a> or similar syntaxes like Sophia documentation suggests. Performance is good, making the library more complex to use to earn 1 extra percent is probably not worth it.

I do not argue that we should end up with something like the small model library of Oxigraph that is too opinionated and requires to many memory copies. An easy to use and to integrate middle ground is what we need.

We should maybe have a look at successful libraries like http or url that are used in a lot of very different places while keeping what I think is a fairly simple API.

Tpt commented 4 years ago

An other maybe interesting example is cssparser that is used by Servo, some SVG libraries and probably soon Gnome Shell.

pchampin commented 4 years ago

Granted, some trade-off are required between genericity and usability. I added a question in the initial description of the issue to acknowledge that.

Re. the for <'a> trick with the Graph trait, it sucks, I totally agree with that. In my view, this was a temporary workaround until generic associated types landed. But admittedly, this might take some time, and we might be better off without this extra layer of genericity. So... #24

MattesWhite commented 4 years ago

Seperation of sophia

In order to provide a clear API, I agree with @Tpt (in this comment). We require to seperate sophia into sub-crates, like other Rust projects, e.g. actix, tokio or crossbeam. My suggestion:

Tpt commented 4 years ago

+1 for the separation of Sophia.

But, before going into that direction, we probably need to make an ecosystem choice. Do we want to have big integrated libraries like what exists in Java with Jena and RDF4J that does most of the things but are hardly interoperable between each other or do we want to have API crates providing common interfaces and then have an ecosystem of libraries implementing them like what exists now in JavaScript with the RDF/JS community group work.

I would be more inclined to prefer the second option that would then see the existence of different implementations targetting different use cases and able to work with each other. We could have an integrated toolkit (sophia), a quad store targetting performance (Oxigraph), a parser suite (Rio), a json-ld implementation... all able to be able to be used with each other.

If we go into the "Sophia is the foundation" direction, I am a bit concern with the choice of the CECILL-C license. The rust compiler currently statically link crates together, making the CECILL-C license viral if I understand it correctly. It might prevent reuse by for profit organizations, reducing significantly the reach of the library.

pchampin commented 4 years ago

I also agree that

Again, I think that the core of Sophia has the potential to play the role of a unified interface. Separating it from the start from the rest of the implementation would send a clearer message to the community, so that should probably go up my list of priorities. I'll do that before the next release.

pchampin commented 4 years ago

Regarding the licensing issue, thanks @Tpt for spotting that. What I'll probably do is change at least the license of sophia-core to a more permissive one, so that closed source implementations of the traits remain possible. For the other crates, I'll stick to CECILL-C by default, possibly opening some of them later.

pchampin commented 3 weeks ago

@damooo @labra @konradhoeffner @MarcAntoine-Arnaud @timothee-haudebourg @Tpt @vemonet @yamdan

Resurrecting this thread, with a different spin. I'm pinging the people whom I suspect would be interested, but feel free to ping others as you see fit.

I still believe that a common crate (or set of crates) for RDF development in Rust would be beneficial. I appreciate that Sophia, as a personal project, is not be the best place to do that.

I propose we create a W3C Community Group for that purpose, and that we work collectively on a new repo hosted on https://github.com/w3c-cg/ . I suggest we could start with a common crate for dealing with IRIs (to avoid the duplication of sophia_iri, oxiri, iref, iris.

Then we could define a crate with a bunch of common traits for RDF (term, triple, graph, quad, dataset...), similar to sophia_api. Those traits could then be implemented by types from oxrdf and rdf-types to improve interoperability.

Of course, my personal opinion is that sohpia_iri and sophia_api are almost-perfect candidates for these crates :wink:. But in the end, I'm happy to go for something different that gather more consensus. In the end, the community wins.

WDYT? If you think this is a good idea, please react with :+1: on this comment.

Tpt commented 3 weeks ago

@pchampin This is a great idea! Thank you! However, I fear this is going to be much harder than RDF/JS APIs. JavaScript makes abstract interfaces much easier with duck typing and preventing low level considerations (no String vs str vs Arc<str> vs Cow<'a, str> vs smallstr vs ...). I am afraid that we will need to pick a cursor between having an easy to use API and a fully featured API covering as many usecases as possible. To take a caricatural example, there is already this tension between oxrdf that is more in the "easy to use" realm and "sophia-api" that is more in the fully featured one. Starting small with something like IRIs is a great idea!

KonradHoeffner commented 3 weeks ago

I have exactly the same opinion as @Tpt :-) but want to elaborate that further, because this may be trivial for you two but maybe not for other potential readers in this issue, as this only occurred to me when working on the HDT Rust library where you cannot return references to resources because they are only available in compressed form. I once talked about this in the Linked Data party of Triply, so I hope it is OK here to share some of the notes. Their response was that this wasn't an issue for them because they usually operate on very large graphs but comparatively small query results, so a minor overhead was not a problem for them (and I think they use C++ where you don't have some of the issues). If you think this is too much noise in this thread I can also move the post somewhere else, and it's been a while since I worked with it the last time so please correct me if I get something wrong.

Main Challenge: How to return triples?

Which collection for triples?

How to represent a single triple?

String in Java

String in Rust

<http://looongsubject.com> <http://ex.com/l> "label".
<http://looongsubject.com> <http://ex.com/c> "comment".

Pointer to String?

Reference to String

let string = String::from("Hello, world!");
// Convert the String to a string slice
let s: &str = string.as_str();
// Compile error!
// Cannot return a reference to a value created in a function.
return s;

Maybe Owned String

Arc

One pattern query method vs many?

pub fn all_triples ...
pub fn triples_with_s ...
pub fn triples_with_p ...
pub fn triples_with_o ...
pub fn triples_with_sp( ...
pub fn triples_with_so( ...
pub fn triples_with_po( ...
pub fn triples_with_spo( ...

Conclusion and personal pain points

MarcAntoine-Arnaud commented 3 weeks ago

I may open the question to defines some traits around major semantic types and then provide quite of a default implementation for a general use case. Then a specific use case may re-implement different implementation, for some performance reasons.

Does it may relevant to open a messaging/working group solution (Slack, Discord, ...) in addition that can be "Semantic in Rust" ?

pchampin commented 3 weeks ago

Thanks all for your support :)

The proposed CG is listed here: https://www.w3.org/community/groups/proposed/?search=rust&groups=Expand+all+groups

It needs 5 expressions of support to be officially created, and we will have a mailing list and a Github repo. @MarcAntoine-Arnaud we can discuss then if we want another channel.

@TpT @KonradHoeffner Yes, the main challenge will be to find consensus on the genericity/usability trade-off, and that's why we should start small. And also, documenting the crates to ease their adoption will be key, IMO.

pchampin commented 3 weeks ago

The Community Group is now officially created: https://www.w3.org/groups/cg/r2c2/ :tada:

damooo commented 2 weeks ago

Great @pchampin . I was inactive for months, hence late.

This was really a tedious problem. A common lingua-franca was felt really essential during work.

And the solving iri zoo will be good start. Instead of custom implementation again, can we use excellent iri_string crate? It offers standard compliant types for RIs, and their builders, validators, normaization, reference resolvers, templates etc as per ietf standards. It also offers unsized uri reference types. Worked great in Manas, for many operations regarding iris. Code quality is too good.

pchampin commented 2 weeks ago

And the solving iri zoo will be good start. Instead of custom implementation again, can we use excellent iri_string crate?

I suggest that we have this discussion on the dedicated repo of the CG, once we create it :)