pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
213 stars 23 forks source link

Add SPARQL support #19

Open pchampin opened 4 years ago

pchampin commented 4 years ago

NB: since sophia uses a generalized RDF model (including variables), a Graph can also be used as a basic graph pattern. The query module contains a preliminary implementation of this idea.

Tpt commented 4 years ago

For Oxigraph I have build a SPARQL parser and a SPARQL algebra representation.

Algebra: https://github.com/Tpt/oxigraph/blob/master/lib/src/sparql/algebra.rs Parser: https://github.com/Tpt/oxigraph/blob/master/lib/src/sparql/sparql_grammar.rustpeg Parser invocation: https://github.com/Tpt/oxigraph/blob/master/lib/src/sparql/parser.rs

It might be interesting to build it as a separated crate and make Sophia and Oxigraph depend on it, just like Rio.

The parser is a bit slow at the moment, I am planning to rewrite it using a more efficient parsing library, probably nom. But I plan to make a working 0.1 release of Oxigraph first.

pchampin commented 4 years ago

Thanks @Tpt for chiming in.

It might be interesting to build it as a separated crate and make Sophia and Oxigraph depend on it, just like Rio.

I have considered this. But as I mentioned above, in sophia it would be more natural to reuse the graph::Graph to represent basic graph patterns, so this might not be the smoother way to go... I'm still open to ideas, though. The more work we can mutualize, the better.

Tpt commented 4 years ago

The more work we can mutualize, the better.

Huge +1. We could maybe have the parser in a separated crate with a fairly cheap algebra representation. Then Sophia could expose an easy to use algebra tree on top of it and Oxigraph could build from it its query plans.

An other way to go would be to have an "rdf-api" crate similar to what RDF/JS is doing for the RDF models and its commons extensions. And have Oxigraph and Sophia and hopefully the other RDF related libraries in Rust use it. But it might be hard to build a nice and efficient API without GAT.

pchampin commented 4 years ago

An other way to go would be to have an "rdf-api" crate similar to what RDF/JS

This should probably be discussed in a separate thread. I created #23 for this. And yes, GAT would be a huge help in this direction.

dwhitney commented 4 years ago

Isaac Newton invented calculus while quarantined, so I guess I can write a SPARQL parser? I've used the oxigraph library quite a bit, and I like it, but the slowest part of it is its parser (acknowledged by @Tpt). I guess I can just start writing one and then ask for feedback? I'd like to to parse into a common AST. I guess using the oxigraph algebra is sufficient?

pchampin commented 4 years ago

@dwhitney that would awesome... :-)

Sophia has evolved quite a bit in the meantime, in order to be more usable as a common API for RDF in Rust. @Tpt and I have agreed (in a discussion offline) that a good way forward would be to extract oxigraph's SPARQL parser and AST into a separate crate, using sophia's Term type as a building block.

FYI, the Term type is currently being refactored (#47, #48, #49). Once this is finished (Literals still need to be done), I plan to extract it into a separate crate sophia_term, so that crates using it, such as this new SPARQL crate, would not end up importing the whole of sophia.

Tpt commented 4 years ago

@dwhitney Great! Thank you! I have done some changes in the parser that have significantly improve the speed of the current parser (migration to peg 0.6 and avoiding duplicate parsing).

dwhitney commented 4 years ago

Haven't had as much time as I'd like to look at this (still working from home). I found this parser. Have either of you taken a look at it? https://github.com/mattsse/nom-sparql

Tpt commented 4 years ago

@dwhitney I was not aware of this paper, thanks! There seems to be no code related to testing the parser against the W3C testsuite, I don't know how correct it is.

dwhitney commented 4 years ago

Yeah if your parser's performance has increased enough, perhaps there is no need to invest the time in a new one, but before you made your improvements, parsing was often the slowest part of the query by several orders of magnitude.

On Sun, Mar 22, 2020, 2:22 PM Thomas Tanon notifications@github.com wrote:

@dwhitney https://github.com/dwhitney I was not aware of this paper, thanks! There seems to be no code related to testing the parser against the W3C testsuite, I don't know how correct it is.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pchampin/sophia_rs/issues/19#issuecomment-602250770, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAIFIIH7HSAOK7GVOURAT3RIZJO3ANCNFSM4JY2PGVA .

lulu-berlin commented 3 years ago

I was also thinking of implementing a SPARQL parser with nom before I found https://github.com/mattsse/nom-sparql which was mentioned by @dwhitney.

It says it's a WIP in the README and there was no development for more than a year. I wonder whether @mattsse would be willing to adapt it to fit into sophia, allowing it, via abstraction, to reuse some types and code. If he only meant it as a WIP which he doesn't want to maintain, maybe he'd be willing to have it adapted and included in sophia.

A question for @pchampin is whether having nom as a dependency is acceptable.

mattsse commented 3 years ago

Hi there 👋

it's been a while since I've worked on it. It was a small side project I only hacked on for few weeks. Unfortunately I did not finish it so far that I was pleased with it and felt good about publishing it and moved on... 🙈

Currently I've got some time on my hands and if that crate could be useful for sophia I'd be willing to adapt/donate it. So long as nom as dependency is acceptable.

There seems to be no code related to testing the parser against the W3C testsuite

I wasn't aware that there is a test suite, If you can point me to where i can find it, I'd be happy to test against it @Tpt

fwiw the parser should be already feature complete(ish), so the most work would probably be

MattesWhite commented 3 years ago

I'm not the owner of sophia so take this with a grain of salt.

The aim of sophia is to provide a common API for RDF in Rust (#23), therefore, it is not intended to include a parser in sophia (the current implementations are more or less artefacts from before the split into several crates). A more fitting approach would be to develop a SPARQL API for sophia, i.e. a bunch of traits, base types and core functionality. So that third party crates, like nom-sparql can implement a parser against this API. In the end this should allow users to pick a parser that fits their needs best. In addition, this means that an implementation of a SPARQL engine is not required to include a parser.


@yever, @mattsse Nice to see new people working on sophia and its ecosystem :+1:

@mattsse You can find out about the SPARQL test suite here: https://www.w3.org/2009/sparql/docs/tests/README.html

Tpt commented 3 years ago

@mattsse The recent versions of the test suite are here: https://github.com/w3c/rdf-tests/tree/gh-pages/sparql11 I use this repository as a git submodule in Oxigraph in order to be able to get quick feedbacks (<1s for the full SPARQL test suite). Here is my testsuite evaluation code: https://github.com/oxigraph/oxigraph/blob/master/testsuite/src/sparql_evaluator.rs It contains also support of query and update evaluation tests.

Oxigraph already has Display implementations: https://github.com/oxigraph/oxigraph/blob/master/lib/src/sparql/algebra.rs via the Sparql* structs (the default display prints the algebra notation). During testing I check that the -> serialized -> parser returns the same tree.

pchampin commented 3 years ago

@yever asked

A question for @pchampin is whether having nom as a dependency is acceptable.

and @MattesWhite replied

The aim of sophia is to provide a common API for RDF in Rust (#23), therefore, it is not intended to include a parser in sophia.

To be more precise: the sophia_api crate aims to provide a common API. Other crates in the sophia repo are intended to provide some implementation of said API (e.g. sophia_term provides an implementation of the trait TTerm) but of course the goal is to keep the ecosystem open (e.g. Oxigraph is now implementing that API). Finally, the sophia crate is gradually becoming a "compilation" of other crates, including sophia_api and sophia_term. Eventually, the code it contains will move into more specialized crates (sophia_X), and the sophia crate itself will only be a bunch of pub use from those specialized crates.

Now regarding SPARQL support, the first step would be to add new traits in sophia_api, related to SPARQL management. Off the top my head, I imagine

Then one or several implementations of these traits could be provided. For Oxigraph, this would amount to simply adapt the existing types to the traits above. But a generic implementation of SparqlDataset, able to resolve queries against any type implementing Dataset, would be nice too... This one could benefit from the nom-based parser by @mattsse.

I hope this clarifies things.

lulu-berlin commented 3 years ago

Thanks @pchampin. This fits my expectations. I noticed that sophia is gradually being modularized and I like this development.

I thought that the nom-based parser could maybe be included in the workspace (making in it an optional dependency for users of sophia) because it was created as a side project and not yet published into crates.io.

I agree that creating the relevant traits for SPARQL would be a good first step that makes a lot of sense. In fact, Oxigraph and nom-sparql can be 2 integration test scenarios for these traits.

I'll try to see how far I can go with implementing these traits and raise a pull request if I have something presentable.

GordianDziwis commented 2 years ago

I have written a SPARQL parser with treesitter. Treesitter is very fast and has bindings for rust. Maybe this is of use.

pchampin commented 7 months ago

I have a very early implementation of a SPARQL engine for Sophia: https://github.com/pchampin/sophia_sparql

It should be integrated in v0.9 (but it might not be fully compliant by that time).