ipfs / interface-datastore

datastore interface
MIT License
23 stars 20 forks source link

disinguish between key and value queries #38

Open Gozala opened 4 years ago

Gozala commented 4 years ago

In the process of adding some types into js-ipfs (https://github.com/ipfs/js-ipfs/issues/2945) I have discovered what appeared to be an error either in declared interface (in readme here or more specifically https://github.com/ipfs/js-ipfs-repo/blob/77a0cdc944ce7fc47f520ab38705259a28cea1dc/src/blockstore.js#L25-L34) Docs say query returns docs say query returns AsyncIterator<Block> while in practice it appeared to return AsyncIterator<{key:Key, ...?}>

As that is how code in js-ipfs used it: https://github.com/ipfs/js-ipfs/blob/80c6fdf1c3ad6aaae14eb4297b7b183fd4fa1311/packages/ipfs/src/core/components/refs/local.js#L9

After digging around this repo it appears that:

  1. If you do not pass {keysOnly:true} it returns AsyncIterable<{key:Key, value:Buffer}>, meaning all of the documentation is wrong.
  2. If you do pass {keysOnly:true} it returns AsyncIterable<{key:Key}> that non of the documentation seems to cover.

I would like to propose:

  1. Make a DataStore interface generic as in DataStore<Key, Value, Entry> this would make
    • Raw data store DataStore<Key, Buffer, [Key, Buffer]> (entry follows Object.entries() idiom of representing key values as tuples)
    • Block data store DataStore<CID, Block, Block> as blocks represent both values and entries.
  2. Update documentation so that query reutrns AsyncIterable<Entry> instead.
    • In raw data store that is AsyncIterable<[Key, Buffer]>
    • In block data store that is AsyncIterable<Block>
  3. Consider alternative to keysOnly options. Because not only it complicates API and makes it hard to document but it's also easy to miss if query options are passed into making it unclear what result would be.
    • Option 1: Just create another queryKeys(...):AsyncIterable<Key> method. This is my preference because in some cases it can be better optimized (e.g. when queries occur across process or thread boundaries).
    • Option 2: If adding a method seems like no-go, how about just remove keysOnly in favor of destructuring in JS for await ([key] of store.query({...}) {...}.
welcome[bot] commented 4 years ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

Stebalien commented 4 years ago

Blocks live at a higher layer of abstraction than the datastore, at least in go-ipfs. The datastore is a simple key/value store.

Gozala commented 4 years ago

Blocks live at a higher layer of abstraction than the datastore, at least in go-ipfs. The datastore is a simple key/value store.

I believe that is also the case in js-ipfs, however interface is similar in spirit & documentation, but different in practice & implementation.

I think having a high level generic interface seems like reasonable way to simplify mental overhead and also avoid some of the pitfalls I've mentioned in regards to {keysOnly: true}, however it is hard for me to asses how would that align with go-ipfs.

@Stebalien do you have opinions or feedback on the proposed changes ?

Stebalien commented 4 years ago

I'm having trouble understanding it, could you take a quick pass over it?

Is this two proposals (a separate blockstore/datastore?)? Or are you proposing a typed datastore?

Gozala commented 4 years ago

@Stebalien reason I opened this ticket is:

  1. Inline documentation of the blockstore API is incorrect. As in it does not match actual behavior.
  2. As I was looking into No 1, I've discovered that documentation of query API in this repo is also incorrect (it omits keysOnly option and implication it has on return type)
  3. Above discovery lead to a specific proposal which:
    • Attempts to disambiguate between key and value queries.
    • Generalized DataStore interface over Key, Value, Entry parameters making most (if not all) data-stores in JS-IPFS being specialization over that generic type.

You have made a point that

Blocks live at a higher layer of abstraction than the datastore, at least in go-ipfs. The datastore is a simple key/value store.

Which I interpreted (I suspect incorrectly) as an argument against No 3, specifically that blockstore and datastore have different interface because they are different layers of abstraction. My response to that was mostly that I think generic data-store interface would still make sense.

Stebalien commented 4 years ago

Which I interpreted (I suspect incorrectly) as an argument against No 3, specifically that blockstore and datastore have different interface because they are different layers of abstraction. My response to that was mostly that I think generic data-store interface would still make sense.

You correctly interpreted my comment. Maybe it's fine in JavaScript? I know that the type system in go is weak enough that abstracting over arbitrary types became a problem for us. We:

  1. Were constantly casting between types (not an issue in JS).
  2. Only really supported "bytes" because we couldn't serialize anything else. This was the main problem. In theory, we supported arbitrary types. However, in practice, all of our concrete persistent datastore implementations only supported bytes.

I expect you'll run into issue 2 in JS as well, unless I'm missing something. I guess with typescript you may be able to enforce reasonable constraints such that datastore adapters operate over arbitrary types while concrete backends operate over specific datatypes?

achingbrain commented 4 years ago

The blockstore doesn't implement the interface-datastore contract, though it does look similar.

If you like you can think of the blockstore as CIDs in, Blocks out and the datastore it wraps is Keys in, Buffers out.

I'm not sure the .query method on the blockstore is commonly used TBH, we might even be able to remove it.

Gozala commented 4 years ago

The blockstore doesn't implement the interface-datastore contract, though it does look similar.

If you like you can think of the blockstore as CIDs in, Blocks out and the datastore it wraps is Keys in, Buffers out.

I understand that. It is somewhat separate discussion, which unfortunately I end up mixing up here. I have created https://github.com/ipfs/interface-datastore/issues/39 to discuss datastore interface and it's relation to blockstore there.

I'm not sure the .query method on the blockstore is commonly used TBH, we might even be able to remove it.

query API was intended subject of this thread. Specifically:

  1. Documentation omits keysOnly option which changes behavior in profound ways.
  2. Which lead me to propose alternative to keysOnly options in original comment.

If removing query all together is better option, that sounds good to. Either way it would be great to do something about query API because in my experience reading code using query can be very confusing and surprising. I encounter handful of times where options were passed form caller and while I was assuming results would be values they were keys.