Open pgte opened 7 years ago
@pgte I am confused I already did the work of writing a generic level interface for datastore that does all this work here. http://github.com/ipfs/js-datastore-level it accepts any leveldown compatible implementation
the conversion from iterator to pull-stream is done here: https://github.com/ipfs/js-datastore-level/blob/master/src/index.js#L90 it's a bit tricky but works quite well as far as I understand
In terms of the options that we support, this is a 1:1 port of the interfaces go provides, so if we want to change anything there we should consider those settings first.
While trying to adapt a datastore into a Leveldown interface,
Oh I am sorry I miss understood you are trying to go the other way around, I haven't looked into that yet.
The main reason I ended up not using the leveldown interface is two fold.
prefix
some background for datastore:
@dignifiedquire that's a great example. Here you mostly have to create a full iterator that iterates over the entire DB snapshot, while filtering it in memory: https://github.com/ipfs/js-datastore-level/blob/master/src/index.js#L96-L100 It's not efficient, wouldn't you say?
It's not great, but leveldown doesn't expose the filtering in the database anyway in a way that I need, so not seeing how this could be improved.
Namely it does not allow for doing any sort of key based filtering directly, without pulling all entries out
@dignifiedquire yeah, it allows for key partitioning, and range queries. I understand that's very limited, but it caters to most use cases I've seen using a kv-store, you just have to decide wisely about the key partitioning / subleveling and perhaps implementing materialised views. I thought the datastore interface was meant to those cases. What use cases is interface-datastore trying to solve?
Abstract storage layers including but not limited to file system, key value stores and sql databases. With a way to combine all those into a path like namespaces. Similar to the goals described here
In addition one important goal is to support all operations that ipfs needs to achieve feature parity with go-ipfs and being able to read and write repos the same way go-ipfs does.
My opinion is that the query interface is perhaps too generic to enable any efficient implementation. I propose that we enable some form of query options that allows range queries upon keys.
Without this, for instance, I'm not able to translate a levelDB query into a datastore query in a way that is efficient during runtime..
The second part (and to me, the one representing more impedance) is the query options. The query options, with the exception of prefix, imply providing a function, which is not easily (or not at all) translatable to a database query. This, I guess, forces implementations to do a full scan a filter data in memory, which may be terrible performance-wise.
This is also something I'm running into in an attempt to move js-ipfs into shared worker (https://github.com/ipfs/js-ipfs/issues/3022). Problem is you can not pass functions across the threads so basically you'd have to send all the data from worker to the main thread and then filter it out there. I think it would be better to represent query as data and provide more complicating filtering as an exercise to the user. That way
ipfs-http-client
so that host can filter data without passing it onto client.
While trying to adapt a datastore into a Leveldown interface, I came across some impedance. Mind you that I'm new to the datastore eco-system, so I may be very wrong.
The first part of it is the fact that a query returns a pull stream. While I love pull-streams, transforming them into into a Leveldown iterator interface is not trivial as far as I know. Here, you may argue that the pull-stream interface is superior, but my guess is that very few developers are familiar with it. Also, there are other alternatives that are more standard, ranging from the Node streams to ES6 iterators.
The second part (and to me, the one representing more impedance) is the query options. The query options, with the exception of prefix, imply providing a function, which is not easily (or not at all) translatable to a database query. This, I guess, forces implementations to do a full scan a filter data in memory, which may be terrible performance-wise.
One option which I like would be to provide a declarative querying interface similar to the Leveldown one, which then allows us to translate these into back-end options on 99% of the cases.