mozilla / rkv

A simple, humane, typed key-value storage solution.
https://crates.io/crates/rkv
Apache License 2.0
310 stars 53 forks source link

Unify `*Reader` and `*Writer` #95

Closed rrichardson closed 5 years ago

rrichardson commented 5 years ago

Presently there are 2 transaction types (plus the proposed Multi*) for both Read and Write transactions.

There doesn't have to be, though. In fact, it's causing problems when I want to execute a transaction over different types of Stores

I see two options: The easiest option is pull the functions from Integer* and Multi* into Reader and Writer, and suffixing them with _int and _multi. They will take IntStore and MultiStore as parameters respectively.

The harder, but perhaps cleaner option is a significant chunk of refactoring: Move the accessor functions into the *Store structs themselves. Make the the functions accept either a Reader or a Writer. This way we'll have more ergonomic and intuitive methods like:

IntStore::put<I: PrimitiveInt>(txn: Writer, id: I, val: Value) -> Result<(), StoreError>
MultiStore::get(txn: Reader, id: K) -> Result<Iter<Value>, StoreError>

For maximum modularity.. I would make a new trait: ReadTransaction which can be implemented by both Reader and Writer.. since you can fetch values in Write transactions as well.

mykmelez commented 5 years ago

The easiest option is pull the functions from Integer* and Multi* into Reader and Writer, and suffixing them with _int and _multi. They will take IntStore and MultiStore as parameters respectively.

This seems reasonable to me, and it's consistent with the design foreshadowed by the unimplemented Writer.delete_value stub (modulo the name, of course).

The harder, but perhaps cleaner option is a significant chunk of refactoring: Move the accessor functions into the *Store structs themselves. Make the the functions accept either a Reader or a Writer. This way we'll have more ergonomic and intuitive methods like:

Note the refactoring in #62, based on discussion in #46, where we chose writer.put(store, key, value) over store.put(writer, key, value) primarily because it seems more consistent with the underlying LMDB API and secondarily because it enables us to add the simplified store.put(key, value) in the future for cases where a transaction only contains a single operation.

rrichardson commented 5 years ago

I have two concerns with the Writer being the primary collection of functionality:

  1. With reader/writers, you'd have to duplicate the get* functionality across both readers and writers, since you should be able to do all of the same getting in a Writer that you can in a reader.
  2. The semantics of multi is tricky, and even in Rkv it's unsafe. Even with a cursor, I'm rather sure that you can re-order the sorted values for a single key by inserting another value.. if it sorts lower than the other keys it will shift them all down. So it should get a special set of data structures to mitigate that.

With reader/writer : (note the names are not terribly important)

impl Writer { 
  fn get_first(MultiStore, Key) -> Value;
  fn get_multi(MultiStore, Key) -> Iter<Value>;
  fn get_bytes(Store, Key) -> Value; 
  fn get_int(IntStore, Key) -> Int;
  fn put*  
  .... 

Notice the overlap of get_bytes, get_first.. there would likely be similar for delete, insert and maybe put.

If we divided the functionality among the stores.. it would be

impl MultiStore {
  fn get(Reader, Key) -> Iter<Value>; 
  fn get_first(Reader, Key) -> Value;
  fn put(Writer, Key, Value) -> Value;
}

impl Store {
  fn get(Reader, Key) -> Value; 
  fn put(Writer, Key, Value) -> Value;
} 

This seems like a much more sane delineation of signatures.. also.. in this model, there is nothing stopping you from doing : Store::put_atomic(Key, Value) -> Value; and the naming make is clear that it runs its own txn

If we want to simplify the interface further.. we can make Writer and Reader implement the ReadTxn trait, so that any get* function can take a ReadTxn ref so that it can operate on either a read or write transaction. This would result in the least amount of duplicated code as far as I can tell.

rrichardson commented 5 years ago

also, with regards to #46, it seems that the sticking point was with the fact that stores needed to be subordinate to the lifetime of transactions, which we now know is not the case.

rrichardson commented 5 years ago

I created https://github.com/mozilla/rkv/pull/97 to better illustrate the two proposed sets of APIs.

rrichardson commented 5 years ago

101 created to implement what was decided in #97

ordian commented 5 years ago

The interface was changed in 0.8 (in #101) to require a mutable reference for a Store in order to perform a put: https://github.com/mozilla/rkv/blob/9ff784c8dd807923505f6f3f062e3c0b50618574/src/store/single.rs#L63

Could you elaborate why? Since *Store are Copy types there is a workaround, but why was this change introduced in the first place? How do you expect users to run transactions from multiple threads, or is it not safe?

rrichardson commented 5 years ago

I am not sure if there is a good reason why I did that. We should discuss this in a new ticket, though, since I was about to close this since it's no longer relevant.

rrichardson commented 5 years ago

@ordian https://github.com/mozilla/rkv/pull/107