quadstorejs / quadstore

A LevelDB-backed graph database for JS runtimes (Node.js, Deno, browsers, ...) supporting SPARQL queries and the RDF/JS interface.
https://github.com/quadstorejs/quadstore
MIT License
203 stars 14 forks source link

Range queries #39

Closed jmatsushita closed 7 years ago

jmatsushita commented 7 years ago

Hi there,

I need to do range queries such as { subject: 'foo*' } which would get all the quads with a subject starting with foo. I can already do them today with a query and filter, but of course it would be much faster if using the underlying leveldb index. I guess something like the 10 lines of code of level-range would be great to have in the quadstore API!

Longer term, I guess it might be possible to do a pluggable solution with the existing indexing plugins for leveldb and surfacing some of this in the query API of quadstore which would be useful for quads with large text objects for instance.

Cheers,

Jun

jacoscaz commented 7 years ago

Hello Jun.

I'm thinking about the changes to support this. I think the easiest thing to do is altering the query logic so that in the presence of a single matching term with a * suffix like { subject: 'subje*' } the params passed to leveldb become

{ start: 'SPOG<separator>subje', end: 'SPOG<separator>subje<boundary>' }

instead of the current

{ start: 'SPOG<separator>subje*', end: 'SPOG<separator>subje*<separator><boundary>' }

That said, how to handle multiple terms like that? AFAIK, leveldb doesn't support regex-like features such as part1*part3* matching part1part2part3part4. Perhaps quadstore should return an error?

jmatsushita commented 7 years ago

Maybe exposing the start end API for each indices? It will would be quick and more thought can be put in a higher level API later as it's likely that it will happen when thinking about the SPARQL side of RDF/JS?

Otherwise there are other levelup search plugins that have some cool apis too...

jmatsushita commented 7 years ago

Just to be clearer I meant allowing to have query params that are either strings

{ subject: "subject" }

or objects

{ subject: { start: "subje", end: "subje\u9999" } }

Its not very pretty, but it's flexible and lets LevelUp do the job :)

jmatsushita commented 7 years ago

And with regards to cool api plugins I mean this one that caught my eye https://github.com/eugeneware/jsonquery but is feels quite foreign to the linked data world :)

jacoscaz commented 7 years ago

That's a good idea, I like it. How would you handle a query with additional terms such as

{ subject: { start: "subje", end: "subje\u9999" }, predicate: "predicate" }

? As leveldb only supports prefix matching rather than full pattern matching, this would require joining two different queries, one for the subject and one for the predicate. I'm hesitant to include higher-level querying logic into quadstore's API, though.

jacoscaz commented 7 years ago

Perhaps limiting the query to a single index could do? Something like queryTerm('subject', {start: 'subje', end: 'subje\u9999'})

jmatsushita commented 7 years ago

Oh I didn't notice this yet, does this mean that currently you can't have multiple terms? Or does this do a join under the hood?

In any case, if using prefix matching somewhat restricts what you can do, then it makes sense to only allow a single range query, and use the join API for further elaborating queries.

jmatsushita commented 7 years ago

Yes to limiting, but I would prefer named parameters in a plain object (as I don't use the RDF interface currently).

jacoscaz commented 7 years ago

Literally just updated my comment to use named params... :).

You can do queries on multiple terms but you cannot do queries on multiple terms with custom start/end params because those would require full pattern matching (if using a single index). To translate this in regex-y terms...

{subject: 'foo'} ==> SPOG::foo::.*
{subject: 'foo', predicate: 'bar'} ==> SPOG::foo::bar::.*
{subject: { start: 'foo', end: 'foo\u9999'}} ==> SPOG::foo.*
{subject: { start: 'foo', end: 'foo\u9999'}, predicate: 'bar'} ==> SPOG::foo.*::bar.*

The last one requires matching not only by prefix but also by pattern, a feature not currently supported by leveldb (AFAIK). English is not my primary language, I hope I have managed to clarify the problem.

jmatsushita commented 7 years ago

Literally just updated my comment to use named params... :).

:)

Ah yes, I understand now the structure of the quad index gets in the way.

English is not my primary language, I hope I have managed to clarify the problem.

Very much so, English is not my primary language either but I understand you very well! :)

a feature not currently supported by leveldb (AFAIK).

I think that the ease to create a new index, is the feature :)

Actually, allowing to access the quadstore indices via the underlying leveldb instance might be a good way to extend quadstore (for instance with a text search index. That would mean that you could do a join with a non quadstore index... Maybe an idea for another issue :)

jacoscaz commented 7 years ago

Definitely a very nice idea for a new issue. Will open it later.

As per this issue, I can't decide whether it'd be worth introducing new methods such as getByTerm(termName, { start, end }) and getByTermStream(termName, { start, end }) or whether we'd be better off postponing this until support for extending quadstore is ready.

What do you think?

jacoscaz commented 7 years ago

Also see #40

jmatsushita commented 7 years ago

I think it makes sense doing it as long as the second Param is optional as it will help give visibility to the new feature. You can mention its experimental and that the API might change in the future.

On 7 Jun 2017 13:50, "Jacopo Scazzosi" notifications@github.com wrote:

Definitely a very nice idea for a new issue. Will open it later.

As per this issue, I can't decide whether it'd be worth introducing new methods such as getByTerm(termName, { start, end }) and getByTermStream(termName, { start, end }) or postpone this until support for extending quadstore is ready.

What do you think?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/beautifulinteractions/node-quadstore/issues/39#issuecomment-306771356, or mute the thread https://github.com/notifications/unsubscribe-auth/AAVvARL6i7-ISxH_pniHpcKSCKRWnwSfks5sBo50gaJpZM4NyYhk .

jacoscaz commented 7 years ago

Closing as lack of time led to #40 taking over this one.