query interface - Githubissues

ipfs / interface-datastore

datastore interface

MIT License

23 stars 20 forks source link

query interface #9

Open pgte opened 7 years ago

pgte commented 7 years ago

While trying to adapt a datastore into a Leveldown interface, I came across some impedance. Mind you that I'm new to the datastore eco-system, so I may be very wrong.

The first part of it is the fact that a query returns a pull stream. While I love pull-streams, transforming them into into a Leveldown iterator interface is not trivial as far as I know. Here, you may argue that the pull-stream interface is superior, but my guess is that very few developers are familiar with it. Also, there are other alternatives that are more standard, ranging from the Node streams to ES6 iterators.

The second part (and to me, the one representing more impedance) is the query options. The query options, with the exception of prefix, imply providing a function, which is not easily (or not at all) translatable to a database query. This, I guess, forces implementations to do a full scan a filter data in memory, which may be terrible performance-wise.

One option which I like would be to provide a declarative querying interface similar to the Leveldown one, which then allows us to translate these into back-end options on 99% of the cases.

dignifiedquire commented 7 years ago

@pgte I am confused I already did the work of writing a generic level interface for datastore that does all this work here. http://github.com/ipfs/js-datastore-level it accepts any leveldown compatible implementation

dignifiedquire commented 7 years ago

the conversion from iterator to pull-stream is done here: https://github.com/ipfs/js-datastore-level/blob/master/src/index.js#L90 it's a bit tricky but works quite well as far as I understand

dignifiedquire commented 7 years ago

In terms of the options that we support, this is a 1:1 port of the interfaces go provides, so if we want to change anything there we should consider those settings first.

dignifiedquire commented 7 years ago

While trying to adapt a datastore into a Leveldown interface,

Oh I am sorry I miss understood you are trying to go the other way around, I haven't looked into that yet.

dignifiedquire commented 7 years ago

The main reason I ended up not using the leveldown interface is two fold.

it is missing some options that go implements that I wanted to support and we are using in the dht, especially prefix
We already have one lazy iterative interface in the code base which is pull-streams and the datastores should fit into here as well as possible. Using pull-streams for this seemed the natural way to go, as I would otherwise in modules like the dht, have to adapt the iterator to a pull stream anyway

dignifiedquire commented 7 years ago

some background for datastore:

pgte commented 7 years ago

@dignifiedquire that's a great example. Here you mostly have to create a full iterator that iterates over the entire DB snapshot, while filtering it in memory: https://github.com/ipfs/js-datastore-level/blob/master/src/index.js#L96-L100 It's not efficient, wouldn't you say?

dignifiedquire commented 7 years ago

It's not great, but leveldown doesn't expose the filtering in the database anyway in a way that I need, so not seeing how this could be improved.

dignifiedquire commented 7 years ago

Namely it does not allow for doing any sort of key based filtering directly, without pulling all entries out

pgte commented 7 years ago

@dignifiedquire yeah, it allows for key partitioning, and range queries. I understand that's very limited, but it caters to most use cases I've seen using a kv-store, you just have to decide wisely about the key partitioning / subleveling and perhaps implementing materialised views. I thought the datastore interface was meant to those cases. What use cases is interface-datastore trying to solve?

dignifiedquire commented 7 years ago

Abstract storage layers including but not limited to file system, key value stores and sql databases. With a way to combine all those into a path like namespaces. Similar to the goals described here

In addition one important goal is to support all operations that ipfs needs to achieve feature parity with go-ipfs and being able to read and write repos the same way go-ipfs does.

pgte commented 7 years ago

My opinion is that the query interface is perhaps too generic to enable any efficient implementation. I propose that we enable some form of query options that allows range queries upon keys.

Without this, for instance, I'm not able to translate a levelDB query into a datastore query in a way that is efficient during runtime..

Gozala commented 4 years ago

The second part (and to me, the one representing more impedance) is the query options. The query options, with the exception of prefix, imply providing a function, which is not easily (or not at all) translatable to a database query. This, I guess, forces implementations to do a full scan a filter data in memory, which may be terrible performance-wise.

This is also something I'm running into in an attempt to move js-ipfs into shared worker (https://github.com/ipfs/js-ipfs/issues/3022). Problem is you can not pass functions across the threads so basically you'd have to send all the data from worker to the main thread and then filter it out there. I think it would be better to represent query as data and provide more complicating filtering as an exercise to the user. That way

Query could be optimized for cases that @pgte mentioned and for multithread use cases.
This would work better with ipfs-http-client so that host can filter data without passing it onto client.
Generally fits better systems that cross language boundaries.