offering collaboration on all-rust storage engine, curious about representative benchmarks of yours

Hey! I have an all-rust KV store I'm building, sled, which has three main goals:

minimize non-rust compile-time dependencies for stateful systems
serve as a flexible platform for developing generalizable stateful testing techniques for the rust ecosystem
beat traditional B+ trees for write throughput and LSM trees for read latency

I'm about to release 0.15 as alpha, encouraging users with recomputable datasets to help exercise the system and hopefully violate as many of my bad assumptions as possible. On the table for 0.16 is an initial implementation of serializable transactions based on a simplified cicada-like approach (we are far from the point where atomic fetch add is our bottleneck, but the contention-aware safety checks are definitely on the table).

I wanted to reach out early to say that I'm quite willing to support the underlying storage semantics that mentat requires, and that I would LOVE to build a representative benchmark of those semantics (and those of other projects) so that I can work towards a high-quality experience for a broad range of the stateful rust ecosystem.

My main question for you folks is, do you have a specific benchmark or set of benchmarks that you prioritize and strive to minimize regressions with? How are you directing tuning efforts? I would love to have some insight into the specific performance goals of your project! Also, if sqlite is less than optimal for some of your required use cases, I would be quite eager to hear about those!

SQLite is a wonderful choice due to its reliability, and my goal is NOT to have something that I can tell people is more reliable. Being an SRE operating distributed databases has made me quite cautious about new storage technologies, and I am not trying to downplay the risks of using a new one. But I am trying to bring modern performance and reliability techniques to the rust and stateful systems engineering ecosystems.

Keep up the good work :)

Hi!

You might be interested in https://github.com/mozilla-prototypes/kista/. Things I like about LMDB: multiprocess; ACID transactions; ridiculously fast; zero-copy (at least at the low level); multi-value keys; integer keys; good iterators. Things I don't like about LMDB: doesn't support network file systems. As far as I can tell you're competing against https://github.com/danburkert/lmdb-rs, right?

Mentat is aiming at a different space: we don't care quite so much about speed or multiprocess access; we're aiming to replace hand-rolled SQLite and JSON files that have hand-rolled data access and synchronizer code.

We do care very much about schema management and evolution; Mentat is intended to support long-lived, evolving, synchronized data.

Mentat's query interface is relational: we compile Datalog down to SQL. It is not a key-value store.

Mentat doesn't have its own query executor per se: the algebrizer and translator and projector implement Datalog semantics by compiling to SQL that targets Mentat's SQL schema, handling the results. The biggest obstacle I see to trying to get a benchmark for you to use is that I don't see how you'll match those semantics; we don't (yet) have a simple key-value storage layer on which we layer a complex query engine.

Take a look at our fixtures: that loads the schema and some data, and then runs some queries. Queries look something like:

.q [:find ?name ?cat
    :where
    [?c :community/name ?name]
    [?c :community/type :community.type/website]
[(fulltext $ :community/category "food") [[?c ?cat]]]]

which returns the names and categories of food-related communities, using full-text search.

If you do

cargo build --release -p mentat_cli
target/release/mentat_cli

you should be able to paste in each of those lines and see what happens.

SQLite is relatively fast: on that small dataset queries take on the order of hundreds of microseconds through to a handful of milliseconds. I'd love a SQLite that gave me direct access to particular indices for programmatic walks, though.

Can you clarify what kind of benchmark you're looking for?

Thanks for pointing me to that fixture! I'm familiar with Datalog, but not Datomic's edn interface. Does mentat aim to provide an identical query interface? If so, that would give me a wealth of existing benchmarks to work with.

Sled's Tree index is a KV store on its own, but the project also has the goal of providing modular building blocks for several other storage needs typically required by databases. The two main modules right now are a pagecache and an index, with an MVCC system under development. After the MVCC system is complete it will be able to be used with multiple indexes and storage systems and it will have a relatively straightforward path to supporting higher-level query and definition languages. LMDB and SQLite are great for the design spaces they occupy, but we are targeting systems that require higher write volumes than traditional page-oriented databases support along with lower read latencies than current cutting-edge LSM trees can provide. After MVCC is implemented, columnar or tile-based storage layouts may be explored for rapid scanning of values.

Because it's aiming to be a pretty modular system that can be molded to different needs, I'm mostly collecting pain points from projects that are building interesting stateful systems in the rust ecosystem to find underserved access patterns. I'm quite interested in Datalog in general, and at some point I might play around with swapping out your query translator and projector for one that works against Sled storage components, and then profiling IO usage to guide specific optimization strategies.

I'm mostly curious about the workloads you've been developing against. But if this is an early project it makes sense if there are not many that are being measured yet.

Does mentat aim to provide an identical query interface?

It's not currently a priority, but it's a pretty good skeleton to work from. Note that we don't yet support pull expressions, rules, or some of the other features that Datomic has. On the other hand, we have limit and order, so 😁

we are targeting systems that require higher write volumes than traditional page-oriented databases support…

Browser workloads tend not to have very high write volumes; writes are mostly driven by user activity or page contents. Certainly I don't think LMDB would be a bottleneck for us. What kind of write volumes are you looking at?

After MVCC is implemented, columnar or tile-based storage layouts may be explored for rapid scanning of values.

Interesting!

I'm mostly curious about the workloads you've been developing against. But if this is an early project it makes sense if there are not many that are being measured yet.

The largest workload we're targeting is something like Firefox's history store: accruing URLs, titles, visits, and metadata about 10-100K pages over the course of several years, and from that powering an assortment of sophisticated derived representations, including the awesomebar. You can see some of how that currently works if you poke around places.sqlite in your Firefox profile.

That's not really a write-heavy system; the tricky part for us will be managing to be fast enough on queries, compact enough, evolvable, and syncable, all at the same time.

mozilla / mentat

offering collaboration on all-rust storage engine, curious about representative benchmarks of yours #568