mozilla / mentat

UNMAINTAINED A persistent, relational store inspired by Datomic and DataScript.
https://mozilla.github.io/mentat/
Apache License 2.0
1.65k stars 115 forks source link

Management of open stores #660

Open rnewman opened 6 years ago

rnewman commented 6 years ago

It's important that a particular store is only opened once at a time (see also #547).

Further, it's convenient for FFI to have a static place to root open stores.

The obvious solution is a manager, just like rkv's.

But what do we manage?

I propose:

The usual pattern for working with a store then changes from:

        let mut store: Store = Store::open("/some/path").expect("opened");
        let mut in_progress = store.begin_transaction().expect("began");
        in_progress.import(fixture_path("cities.schema")).expect("transacted schema");

to

        let store: &mut Store = Stores::open("/some/path").expect("opened");
        let mut in_progress = store.begin_transaction().expect("began");
        in_progress.import(fixture_path("cities.schema")).expect("transacted schema");

— a one-character change and a change in type.

ncalexan commented 6 years ago

@fluffyemily started looking at this. She followed the rkv model of a lazy_static! manager struct that owns a map of Store instances, and she immediately ran into the fact that rusqlite::Connection instances are Send but not Sync. (See https://github.com/jgallagher/rusqlite/issues/188 and especially https://github.com/jgallagher/rusqlite/blob/0.13.0/src/lib.rs#L680-L738.) This is not the same as lmdb, where the underlying connection structures are Send and Sync -- see https://github.com/danburkert/lmdb-rs/blob/0.7.2/src/environment.rs#L156-L157. It is probably the case that we can work around lazy_static! restrictions, so I won't dwell too much on this.

However, the API @rnewman sketched, in particular

Given a path, give me a (mutable) reference if the store is open. fn {get,get_mut}(path: PathBuf) -> Option<{&mut,&} Store>

appears to me to assume that Store is Sync, so that I can share a reference to Store across threads.

I can see two ways to proceed:

1) force SQLite to act more like lmdb by ensuring that we open our SQLite connections in "serialized mode" (see https://www.sqlite.org/threadsafe.html), and then mark our Store as being Sync. This is probably quite straightforward, since rusqlite already has non-trivial logic to ensure the "multithreaded mode" option of SQLite is available. It's somewhat unpalatable because it imposes SQLite overhead even in the single-threaded case, which is the common case; and because it's a hefty chunk of unsafe functionality, although this is exactly what SQLite is intended to implement. One question that I haven't been able to determine from the SQLite documentation is whether the result of sqlite3_open truly is thread-safe, i.e., whether concurrent access from multiple threads is legal. If it's not, this option is out entirely.

2a) don't share SQLite connections at all, and make Stores strictly about managing the Mentat-level Conn metadata in a central place. That's still really useful!

2b) only share SQLite connections per-thread, making Stores use thread_local! to manage per-thread caches of SQLite connections. This model comes the closest to capturing @rnewman's original API, but it's not clear to me that the extra effort is worth it. Do we really anticipate consumers opening the same Mentat store in the same thread frequently enough to justify this layer of caching?

My preference is to experiment with 2b) and hope that it's straight-forward (and I expect it will be straight-forward), falling back to 2a) if it's cmplicated, and leave 1) entirely, which I view as unsafe.