mozilla / rkv

A simple, humane, typed key-value storage solution.
https://crates.io/crates/rkv
Apache License 2.0
307 stars 52 forks source link

support architecture-independent database migration #142

Closed mykmelez closed 5 years ago

mykmelez commented 5 years ago

@ncloudioj FYI, this is a work in progress that enables migration of databases across architectural changes. It appears to work correctly for a subset of databases, although the implementation is incomplete, and it isn't efficient.

mykmelez commented 5 years ago

@ncloudioj Ok, I think this is more-or-less ready for review.

The bulk of the code is in migrate.rs, and the test_migrate_and_replace() function demonstrates how a consumer might go about migrating an LMDB environment that was created on a different architecture.

I've also created a dump.rs binary that can be invoked from the command line. It's similar to mdb_dump, except that it supports fewer options, and the dump of a main database in an environment that contains subdatabases is not identical to the output of mdb_dump in that case.

(I could make it so, but I wrote dump.rs to help with manual testing, not for general utility, so I'm not sure it's worth the effort.)

Finally, I've created a rand.rs binary that can be invoked from the command line to create an LMDB environment with random data. This helped me create the reference environments, and it too is not intended for general utility, so it's relatively specific to that task.

Speaking of which, the reference environments are relatively large binary files, so I'm unsure if the repository is really the best place for them, although I don't have a better idea. Perhaps I should rather create smaller reference environments to store in the repo.

One last note: on AppVeyor, we test on both 32- and 64-bit Windows systems, so we're testing migration in both directions. On Travis, however, we test only on 64-bit macOS and Linux systems, as it isn't straightforward to test on 32-bit systems there, so we're only testing migration from 32- to 64-bit.

I've tested both directions on Linux, however, via 32- and 64-bit Linux VMs. And 64- to 32-bit migration probably doesn't matter for macOS. So I think we have reasonable coverage by manual testing. And we can probably find a way to test on 32-bit Linux in the future as well (perhaps in a Docker image running on Travis).

ncloudioj commented 5 years ago

@mykmelez I've finished the first pass of scan by comparing this patch against the lua version, it looks solid! In particular, the extra addition of the subdb parsing is super useful, excellent work! 👍

Would like to take another look once the review comments get resolved.

ncloudioj commented 5 years ago

As for those test fixtures, I think we can keep them in the repo for the convenience, but do not include them in the crate. Perhaps add a few lines of description in the repo README so that consumers can generate them on-demand.

mykmelez commented 5 years ago

As for those test fixtures, I think we can keep them in the repo for the convenience, but do not include them in the crate. Perhaps add a few lines of description in the repo README so that consumers can generate them on-demand.

I've excluded them (and the src/bin/ files) from the crate package in 54ba35c. I haven't yet added a description to the README; I'll work on that next.

mykmelez commented 5 years ago

@ncloudioj The most recent commit, 07524f7, halves the sizes of the test fixtures, which now contain only 50 key/value pairs rather than 100, in order to reduce the size of a repository clone. I also modified rand.rs to allow specifying the number of key/value pairs to create on the command line. And I moved the fixtures into the tests/envs/ subdirectory to make it easier to exclude only those files.

That last change enabled me to update the exclude rule in Cargo.toml to only exclude tests/envs/ rather than the entire tests/ directory. I also stopped excluding dump.rs and rand.rs, and I added a comment to README.md explaining that the test fixtures aren't included in the crate package but can be retrieved by cloning the repo or recreated via rand.rs and dump.rs.

Because I've recreated the test fixtures, this branch should be integrated via "squash and merge", so the old fixtures don't show up in history (and take up the space I'm trying to save by reducing their sizes). I'll do that once the automated tests succeed.

ncloudioj commented 5 years ago

@mykmelez Makes sense!

Go ahead merge it once CI finishes the job.