mozilla / rkv

A simple, humane, typed key-value storage solution.
https://crates.io/crates/rkv
Apache License 2.0
310 stars 53 forks source link

transaction access unification proposal #97

Closed rrichardson closed 5 years ago

rrichardson commented 5 years ago

Here are two API options for ensuring that Integers, DUP_SORT, and regular databases can be accessed within the same transaction.

Store-Centric

The Store centric API is similar to how the API looked in the past, except that the Store is not subordinate to the transaction. The advantage of this approach is the method names are simpler and more consistent (although some could argue ambiguous) and there is a clear delineation in functionality. In this API, either the read or write transaction can be passed to any of the db methods (where applicable) https://github.com/mozilla/rkv/blob/eeeea6f99b0d04f947f4fa210a114fe8cc5f5ff0/proposal/store.rs

Transaction-Centric

The Reader / Writer centric model is similar to the current API design, except instead of having a Reader and Writer for every type of Store (regular, Int and Multi), there is only Reader and Writer, and they contain all methods necessary for accessing every type of store. This has the advantage of having fewer structs that contain core functionality. But the methods are a bit messier. https://github.com/mozilla/rkv/blob/eeeea6f99b0d04f947f4fa210a114fe8cc5f5ff0/proposal/writer.rs

These two APIs are not mutually exclusive.. we could put the functionality in both places. That might end up being burdensome to maintain, but it's possible.

In either case, I think we should let the Rkv::get/put/insert/del functions remain as the mechanisms for 1-shot transaction functionality.

Also, this proposal doesn't address the problem that DUP_SORT dbs have with cursors potentially unsafely moving data around.

It also doesn't address the fact that currently cursors can go out of scope while references that created them can live on.

mykmelez commented 5 years ago

These two APIs are not mutually exclusive.. we could put the functionality in both places. That might end up being burdensome to maintain, but it's possible.

Indeed, but I would avoid doing this unless we get good evidence that developers expect both styles in equal-enough proportions to make a single one burdensome on them. I suspect we'll be fine with only one of these.

In either case, I think we should let the Rkv::get/put/insert/del functions remain as the mechanisms for 1-shot transaction functionality.

That seems reasonable.

Also, this proposal doesn't address the problem that DUP_SORT dbs have with cursors potentially unsafely moving data around.

It also doesn't address the fact that currently cursors can go out of scope while references that created them can live on.

Yep, these are important to resolve but orthogonal to either of these proposals.

A couple of benefits of the Store-centric model come to mind as I compare these proposals:

  1. The method names are the same for equivalent functions: the method that retrieves a key-value pair is always called get(); ditto for put() and del(). Thus method names vary only by changes in the behavior of the method (put() vs. put_flags()), not changes to the type of the store on which the method is called (Store vs. IntegerStore vs. MultiStore).
  2. The transaction-centric model invites you to create a transaction first, then create a store to use with it. But that doesn't work for write transactions, as store creation requires a write transaction, and lmdb-rs creates one expressly for the purpose of creating the store (after which it aborts the transaction), which causes store creation to hang if a write transaction is extant at the time, which is a footgun.

I also imagine that the Writer struct's method list would grow quite unwieldy if we ever add another Store type (like a fixed-value-size store) or another operation (such as delete_all()) although I don't have any plans to do so at the moment.

One last consideration is which one is more idiomatic from a Rust API design perspective. At the moment, it isn't clear to me that either of them are more or less idiomatic than the other. I'll think a bit more on this to see if there's something I'm missing.

rrichardson commented 5 years ago

I think it would be good to get more eyeballs on this. That said, I am confident in the Store approach enough that I will probably start to implement it at the risk of having to redo the work.

mykmelez commented 5 years ago

@ncloudioj You and I previously discussed the API design before the previous round of changes. What do you think about the two options presented here?

ncloudioj commented 5 years ago

I think the Store-centric makes more sense to me.

First of all, since "store is not subordinate to the transaction" in this proposal, which means it still allows us to access multiple stores within the same transaction, as we previously discussed in https://github.com/mozilla/rkv/issues/46.

Secondly, different stores expose different APIs, that looks much clearer and more intuitive than cramming all sorts of APIs into a single "Reader" / "Writer".

As @rrichardson mentioned earlier, the downside is obviously that we might end up with lots of "Stores" in rkv. In my previous project, since LMDB supports integer type for both key (MDB_INTEGERKEY) and value (MDB_INTEGERDUP), we had four different types of Stores with that combination. Then, we made it even worse by introducing signed (or unsigned) int8, int16, int32, and int64, since LMDB allowed us to use the user-defined integer comparison functions. The saddest part is that we had to define both get and get_dup (for MDB_DUPSORT) for each "Store", and made get_dup act as "no-op" for those non-dupsort stores, otherwise, the total number of "Store"s would be doubled again. We definitely want to avoid that kind of mistake here.

mykmelez commented 5 years ago

As @rrichardson mentioned earlier, the downside is obviously that we might end up with lots of "Stores" in rkv. In my previous project, since LMDB supports integer type for both key (MDB_INTEGERKEY) and value (MDB_INTEGERDUP), we had four different types of Stores with that combination.

Agreed, this is a risk, although it feels like an easier problem to have than the proliferation of methods on a single Writer.

Then, we made it even worse by introducing signed (or unsigned) int8, int16, int32, and int64, since LMDB allowed us to use the user-defined integer comparison functions.

The current IntegerStore implementation generalizes across integer type, and we should be able to do the same if we implement @rrichardson's proposal. I've implemented support for u16 and u64 types in #99 (although it isn't clear how flexible LMDB is regarding integer type).

The saddest part is that we had to define both get and get_dup (for MDB_DUPSORT) for each "Store", and made get_dup act as "no-op" for those non-dupsort stores, otherwise, the total number of "Store"s would be doubled again. We definitely want to avoid that kind of mistake here.

Indeed, we should avoid exposing a no-op method at runtime.

mykmelez commented 5 years ago

Ok, after thinking about it some more, I agree that the Store-centric approach will be more ergonomic in the long run. Let's switch to it!

@rrichardson I'll be incommunicado for a couple of weeks, but I'm happy to review anything you've developed in the meantime once I return, and @ncloudioj and @ncalexan can also review in my absence (and my presence, for that matter!).

rrichardson commented 5 years ago

Great! I just made it through a monumental deadline so I should have lots (ha) of free time to slam this out.