mongodb-partners / mongo-rocks

MongoDB storage integration layer for the Rocks storage engine
398 stars 101 forks source link

Backup and restore #7

Open ryanmills opened 9 years ago

ryanmills commented 9 years ago

Hi,

I have been playing with MongoDB+RocksDB and would like to know a little more about the backup/restore process?

(a) Are backups executed via db.adminCommand({setParameter:1, rocksdbBackup: "/var/lib/mongodb/backup/1"}) incremental? IE, if I specify the same path everytime does it just backup the changes since the last backup? Assuming this is what checkpoint->CreateCheckpoint(path) means.

(b) Is there currently a way to restore backups via db.adminCommand? I wasn't able to find any calls within rocks_engine.cpp to backup_engine->RestoreDBFromLatestBackup?

(c) If there is no way to restore a backup via MongoDB, how would you do it manually via the filesystem/bash shell?

Thanks!

Ryan

igorcanadi commented 9 years ago

Hi @ryanmills . Thanks for your interest.

(a) RocksDB's files are immutable. Backup is just doing hard-linking. So creating a backup (on the same filesystem) in the short term will not add any additional data (and not copy anything). If you create backups in two different directories, and there are two files with the same name in both directories, this means they are the exact same file (hard-linking just increases reference to that file). So in a way it's an incremental backup (since the data is not copied), but I would recommend creating the backup in a different directory. We're actually building an external tool that will (based on this API) send incremental backups to S3 periodically. We'll just check if there are any files in the new backup that are not already uploaded to S3 and just upload them. Files that are already on S3 don't need to be overwritten. Stay tuned.

(b) Not yet. :( If you want to do that: 1. stop mongo, 2. move/copy all the files to mongo_directory/db directory (this is where RocksDB keeps its files). don't worry about deleting files already there. rocksdb will garbage-collect obsolete files on startup. 3. start mongodb

(c) see above :)

Let us know if you have any more questions!

dynamike commented 8 years ago

We open sourced a tool to do this -- https://github.com/facebookgo/rocks-strata

thapakazi commented 6 years ago

Since the issue is still left open... asking a backup question here Is there way, I could tell strata skip the backup for local db or mention it to backup only the collections I need. Could be asking too much, bare with me :neckbeard:

Asking all these, because on my recent trial with strata, one of my prod instance was frozen as strata starting creating backups(hardlinks), it ate all the machine resource (cpu, ram, disk io) as there were tons of .sst files and my machine was not high profiled either. To my understanding it(strata) copies almost everything (all the .sst files created during compaction) from backup dir to s3. New to these toolsets, kinda late actually, cast me some insights here... thanks.

I am thinking of jailing strata to use limited system resource and do the incremental backup as described in example cron scripts

igorcanadi commented 6 years ago

Hey @thapakazi , unfortunately there is no way to copy specific collections yet. It would be possible to implement, but not simple, since all collections are intermingled in a single RocksDB database (as opposed to having a single database for each collection).

I am thinking of jailing strata to use limited system resource and do the incremental backup as described in it example cron scripts

That's definitely doable. I have no idea why strata would use a lot of CPU or RAM, the only thing it does it 1) hardlink files, 2) copy to S3. However, adding rate limiting to strata would definitely be very valuable.