mozilla / rkv

A simple, humane, typed key-value storage solution.
https://crates.io/crates/rkv
Apache License 2.0
307 stars 52 forks source link

Trying to understand general goal/state/bugs of the project #190

Open atsepkov opened 4 years ago

atsepkov commented 4 years ago

Hi, in my search for LMDB bindings for Rust I stumbled upon this project as one of the more actively maintained ones. What scares me is the warning on the main page not to use LMDB because of 3 major bugs. I also don't quite understand the difference between this project and lmdb-rs, which seems to be a dependency. Is this project a higher-level abstraction/ORM around lmdb-rs that aims to eventually support multiple DBs?

Are the bugs corrupting the DB in this repo or in lmdb-rs (it's hard to tell since they're tracked in bugzilla rather than on github) or are these fundamental problems with LMDB itself? What are the implications of using "SafeMode" backend performance and scalability-wise? How different is SafeMode under the hood from actual LMDB? Is SafeMode's performance better or worse than something like SQLite? Will it eat up all available RAM for my project (description below) if I deploy it on AWS micro instance?

My project is meant to run comfortably on a single AWS micro instance, it already uses SQLite (but not in a way that relies on SQL functionality), and my goal is to make it more lightweight. My data (writes) comes from various batch operations that run relatively rarely (i.e. once/day) and my main use case I want to optimize (and the reason for LMDB) is read performance (both sequential and random access). So reads are random (user-generated) and common (fetching up to 60k entries at a time), whereas writes are system-generated, scheduled, and rare. My dataset currently contains around 200k entries with potential for several million (each entry would contain around 50 fields of numeric data - which I will serialize).

How likely am I to get bitten by the crashes/bugs you mention in the use case I described? What are the implications? Total DB corruption? Is it recoverable? Can corruption/crash happen on read operations? If these bugs are limited to rkv, would you recommend I use lmdb-rs directly instead? Would you recommend staying from LMDB altogether regardless of bindings?

rnewman commented 4 years ago

Is this project a higher-level abstraction/ORM around lmdb-rs that aims to eventually support multiple DBs?

My intent in writing it was to provide a safe, typed abstraction that smoothed off many of the sharp edges of LMDB but preserved its performance.

I'm sure some of the goals have shifted over time (e.g., rkv now seems to support multiple backends!), but I believe those are still the core. @victorporof and @mykmelez can likely comment more.

Are the bugs corrupting the DB in this repo or in lmdb-rs (it's hard to tell since they're tracked in bugzilla rather than on github) or are these fundamental problems with LMDB itself?

They look like crashes in LMDB itself; I would be surprised if lmdb-rs would behave any differently if your own code called it instead of rkv, and in theory that is also true if you used mdb.c.

The two public-visible bugs map to these in LMDB:

https://www.openldap.org/lists/openldap-bugs/201410/msg00051.html https://www.openldap.org/its/index.cgi/Incoming?id=9037;selectid=9037

How likely am I to get bitten by the crashes/bugs you mention in the use case I described?

I suspect that you will not; if you control your key length, filesystem, etc. then your system is a great deal more predictable than Firefox running on millions of end-user computers and operating systems and filesystems.

https://bugzilla.mozilla.org/show_bug.cgi?id=1596063 is tracking three crashes. One has occurred once, one is "low volume" but a startup crash, and one is non-public.

mykmelez commented 4 years ago

I don't have much to add beyond what @rnewman said. As he noted, the crashes in question look like crashes in LMDB, although I haven't been able to absolutely confirm that. Nevertheless, it's unlikely that you'll experience them, unless you happen to be distributing software to tens of millions of users running a variety of often-archaic versions of popular desktop operating systems.

As for the goals, as far as I know, they remain the same at a high level as @rnewman stated. However, I haven't been involved much since @victorporof took over primary responsibility for the project, so he's the authoritative source for info about its goals and direction.