Open rnewman opened 6 years ago
@fluffyemily started looking at this. She followed the rkv
model of a lazy_static!
manager struct that owns a map of Store
instances, and she immediately ran into the fact that rusqlite::Connection
instances are Send
but not Sync
. (See https://github.com/jgallagher/rusqlite/issues/188 and especially https://github.com/jgallagher/rusqlite/blob/0.13.0/src/lib.rs#L680-L738.) This is not the same as lmdb
, where the underlying connection structures are Send
and Sync
-- see https://github.com/danburkert/lmdb-rs/blob/0.7.2/src/environment.rs#L156-L157. It is probably the case that we can work around lazy_static!
restrictions, so I won't dwell too much on this.
However, the API @rnewman sketched, in particular
Given a path, give me a (mutable) reference if the store is open. fn {get,get_mut}(path: PathBuf) -> Option<{&mut,&} Store>
appears to me to assume that Store
is Sync
, so that I can share a reference to Store
across threads.
I can see two ways to proceed:
1) force SQLite to act more like lmdb by ensuring that we open our SQLite connections in "serialized mode" (see https://www.sqlite.org/threadsafe.html), and then mark our Store
as being Sync
. This is probably quite straightforward, since rusqlite
already has non-trivial logic to ensure the "multithreaded mode" option of SQLite is available. It's somewhat unpalatable because it imposes SQLite overhead even in the single-threaded case, which is the common case; and because it's a hefty chunk of unsafe
functionality, although this is exactly what SQLite is intended to implement. One question that I haven't been able to determine from the SQLite documentation is whether the result of sqlite3_open
truly is thread-safe, i.e., whether concurrent access from multiple threads is legal. If it's not, this option is out entirely.
2a) don't share SQLite connections at all, and make Stores
strictly about managing the Mentat-level Conn
metadata in a central place. That's still really useful!
2b) only share SQLite connections per-thread, making Stores
use thread_local!
to manage per-thread caches of SQLite connections. This model comes the closest to capturing @rnewman's original API, but it's not clear to me that the extra effort is worth it. Do we really anticipate consumers opening the same Mentat store in the same thread frequently enough to justify this layer of caching?
My preference is to experiment with 2b) and hope that it's straight-forward (and I expect it will be straight-forward), falling back to 2a) if it's cmplicated, and leave 1) entirely, which I view as unsafe.
It's important that a particular store is only opened once at a time (see also #547).
Further, it's convenient for FFI to have a static place to root open stores.
The obvious solution is a manager, just like rkv's.
But what do we manage?
I propose:
Store
'sconn
fieldArc
. This gives us the ability to shareConn
s and thus useStore
as a pair of a connection and aConn
when there are multiple open SQLite connections.Store
, perhaps namedfork
, that yields a new store with clonedconn
using a providedrusqlite::Connection
. This allows us to get a new copy of a store that's already open.Store
to create an in-memory store. The empty path will now be confusing. (I believe the FFI already makes a distinction…)Store
s, just as in rkv. It's important to canonicalize file paths. "Stores" might be a good name.fn is_open(path: PathBuf) -> bool
Store
, opening it if necessary.fn open(path: PathBuf) -> Result<&mut Store>
fn {get,get_mut}(path: PathBuf) -> Option<{&mut,&} Store>
Store
instance with its own SQLite connection. I understand that I now have to work with SQLite's concurrency model (e.g., I can encounterSQLITE_BUSY
), but Mentat will serialize writes toMetadata
.fn connect(path: PathBuf) -> Result<Store>
Store.conn
and fail if it's more than 1. It can only be more than 1 if a caller has cloned the store, either explicitly or viaconnect
.The usual pattern for working with a store then changes from:
to
— a one-character change and a change in type.