josephg / node-foundationdb

Modern Node.js FoundationDB bindings
Other
116 stars 17 forks source link

Versionstamps in tuple layer #19

Closed josephg closed 5 years ago

josephg commented 5 years ago

Tracker issue to get versionstamps working in the tuple layer.

Unfortunately versionstamps corrupt the abstraction layer, since writing a versionstamp requires us to use txn.setVersionStamp instead of txn.set when we write. Also, we can only handle having one versionstamp in the (nested) tuple encoding.

Its a right mess. I can implement the mess, mirroring the implementation in python and java. But its a really ugly design. Please ping this issue if this is important to you.

ryanworl commented 5 years ago

👋 very important to me

RE the nested tuple encoding, what is a use case for a nested tuple that uses more than one versionstamp?

I can see use cases for nested tuples which already have versionstamps applied from a previous write, but I'm struggling to think of one where you'd want to write two versionstamps at the same time. That isn't possible regardless given the FDB API, by you get my point I think.

Also, I think we could restore the abstraction layer in the set method by checking if the tuple contains whatever "tag" we use to represent a versionstamp before it is serialized in the tuple. If the tag is in the tuple, we use the atomic op API instead of regular set.

This could also apply for versionstamped values where the value is also encoded with the tuple layer (more of a niche use case I would guess)

josephg commented 5 years ago

Yeah that could work - probably via an optional method in the transaction’s key encoder / value encoder.

josephg commented 5 years ago

I've got this working, though as expected it involved moving a lot of things around.

There's currently no way to pass in a tuple and then get the tuple back again, with the versionstamps all baked in, which is a bit awkward. But you can make it work. Examples pulled from the test suite:

Using the helper on the database object:

      const db_ = db.withKeyEncoding(tuple).withValueEncoding(encoders.string)
      const vs = await db_.setAndGetVersionStamp([1,[2, {type: 'unbound versionstamp'}]], 'hi there')

      const results = await db_.getRangeAllStartsWith([])
      assert.deepStrictEqual(results, [[[1,[2, {type: 'versionstamp', value:Buffer[...]}]], 'hi there']])

Or done manually in a transaction:

      const db_ = db.withKeyEncoding(tuple).withValueEncoding(encoders.string)
      const vs = await (await db_.doTn(async tn => {
        tn.set([{type: 'unbound versionstamp'}], 'a')
        tn.set([{type: 'unbound versionstamp'}], 'b')
        return tn.getVersionStamp()
      })).promise

      const results = await db_.getRangeAllStartsWith([])
      assert.deepStrictEqual(results, [
        [{type:'versionstamp', value:Buffer[...]}], 'a'],
        [{type:'versionstamp', value:Buffer[...]}], 'b']
     ])

That {type: 'unbound versionstamp'} thing is a token in the tuple which will get replaced with a current versionstamp when you commit the transaction. You can also manually specify the 2 byte code if you need to - {type: 'unbound versionstamp', code: 123}. Its all super verbose - unbound versionstamp is huge to type in a dynamic environment like JS where most people don't have autocompletion. But I'm not sure what to call it - thoughts?

I'd also like to:

ryanworl commented 5 years ago

I think it would be best to keep it the same as the tuple layer in the Python API

i.e.

import { tuple } from 'foundationdb';

let db = ...

let vs = tuple.versionstamp(1);

let key = ["users", vs];

// imagine top level await works...
await db.doTn(async tn => {
    tn.set(key, ...);
});

console.log(key);
// prints the encoded equivalent of ["users", 12 byte versionstamp]

If we detect an unbound versionstamp, we use the set_versionstamped_key method and then add it to a queue to fill in ourselves with the commit versions after the transaction commits.

I'm probably missing something but that seems like the ideal API, as you can essentially ignore the fact that you're using versionstamps at all other than using the special token when creating the key.

josephg commented 5 years ago

Thats basically what I have now, except minus the tuple.versionstamp() helper method and it also doesn't fill in the versionstamp once the transaction has committed.

It seems weird / wrong to be mutating the key when you call set, but at the same time its totally what you want when you're actually using the API. 👍

josephg commented 5 years ago

Ok I've made a lot of changes and I'm pretty happy with the API now. It looks like this:

const db_ = db.withKeyEncoding(encoders.tuple)
const key: TupleItem[] = [1,2,3, {type: 'unbound versionstamp'}]
await db_.setVersionstampedKey(key, 'hi')

// key is now [1, 2, 3, {type: 'versionstamp', value: Buffer<12 bytes>}]

You can also call setVersionstampedKey(key, 'hi', false) and it won't bake the versionstamp back into your key.

I've also implemented client-side codes (0, 1, 2, ... autoincrementing, encoded in big endian). Or you can manually set the code:

await db_.setVersionstampedKey([{type: 'unbound versionstamp', code: 123}], 'hi')

The one part of the API I'm not happy with is naming. @ryanworl any thoughts about what this horrible thing should be called? {type: 'unbound versionstamp'} I could shorten versionstamp to 'vs' but its not descriptive. It also doesn't help that I haven't been consistent about capitalization - is it Versionstamp or VersionStamp?

ryanworl commented 5 years ago

Versionstamp is the Java name and I think being consistent is better regardless of what I would prefer.

I think unbound versionstamp and that object structure is fine, but it should be exposed through a helper method so you never have to actually type it. See my example above.

josephg commented 5 years ago

Versionstamp is also the capitalization in the autogenerated options & other bindings. I'll change that.

I can make a helper method - I guess I just feel like if you're going to type the name of a method, you may as well just construct the object. Better debugging in JS I guess though..

ryanworl commented 5 years ago

The helper method gets you autocomplete too

josephg commented 5 years ago

You get autocomplete either way in typescript, but typos are more dangerous without the helper method.

josephg commented 5 years ago

This whole API is super wordy, but what can you do?

API docs

📦 foundationdb 0.8.0

ryanworl commented 5 years ago

Do you find it ill-advised to add some special behavior where if a tuple contains an unbound Versionstamp and you use the set method instead of setVersionstampedKey, it detects the unbound Versionstamp and uses setVersionstampedKey for you instead?

None of the other bindings have that feature, but I think it is nice for consistency myself, especially if you have a higher level thing that just feeds your transaction things to do (think an interpreter for a language), it doesn't need to differentiate between the two methods and can just construct the tuples correctly.

This is probably a bit of a layering violation too.

josephg commented 5 years ago

My first implementation worked that way, but it was a bit weird / fragile because either the key or the value could have an unbound versionstamp, but not both. And you could have 0 or 1 unbound versionstamps, but not more than 1. Also as you say, none of the other bindings do that - they all take the stance that explicit > implicit.

So yeah, in the end I made the call to follow their lead and force a different function call if you were baking versionstamps. I also think there's already more than enough magic with the key / value transformers stuff as it is.