leo-project / erocksdb

Erlang bindings to RocksDB datastore
68 stars 15 forks source link

Feature/backup #10

Open benoitc opened 8 years ago

benoitc commented 8 years ago

This PR add the erocksdb:backup/2 and erocksdb:restore/2 function to backup the database incrementally and restore it:

https://github.com/facebook/rocksdb/wiki/How-to-backup-RocksDB%3F

Any feedback is welcome.

yosukehara commented 8 years ago

Thank you for your request. We'll check it.

benoitc commented 8 years ago

since I have implemented https://github.com/benoitc/erocksdb/commit/49978f74aa29a5a6c4f817b1d1cfba367e0d3905

which is using the new checkpoint feature. It's quite more efficient and used by the mysql version of facebook that uses rocksdb. Let me know.

mocchira commented 8 years ago

@benoitc Thank you for your great PR. Since we've also wanted this features, this PR is very welcome.

Anyway, I wrote the review result below so please check it out.

  1. Make backup/restore async as the snapshot feature you are PRing now. As you know, Since backup/restore may take a long time, an Erlang scheduler thread executing those NIFs may block for an undesirable time.
  2. Keep BackupEngine alive and not to recreate it every time. The below is picked up from https://github.com/facebook/rocksdb/wiki/How-to-backup-RocksDB%3F Also we recommend to keep BackupEngine alive and not to recreate it every time you need to do a backup. So I'd recommend you to make its lifetime the same with DbObject.
  3. Ensure your code running properly on Linux, FreeBSD and SmartOS (if possible) Since we've supported those 3 platforms and already have users, I'd like you to test on those if possible or tell us what platforms you've tested. We will test other platforms instead of you.
benoitc commented 8 years ago

@mocchira the code have been tested on OX and linux. not yet on freebsd.

About the implementation obviously it would be better to implement the backup as an asynchronous task. I will check it. However https://github.com/benoitc/erocksdb/commit/49978f74aa29a5a6c4f817b1d1cfba367e0d3905 is probably a lot more efficient since it allows you backup only and only copy needed file, hard-linking others which is very fast. Maybe the patch above could be skipped in favour of such solution?

mocchira commented 8 years ago

@benoitc

However benoitc@49978f7 is probably a lot more efficient since it allows you backup only and only copy needed file, hard-linking others which is very fast. Maybe the patch above could be skipped in favour of such solution?

I've checked how the both of features works/implements.

Long story short,

Since the checkpoint handles a SST file as a hard link or just leaving it (NOT copying) and needless to say, backup should be copied to other physical device normally, Checkpoint alone seems NOT to be an complete backup solution but an building block to make an original backup solution (mysql adopts Checkpoint for this reason).

On the other hand, Backup is regarded as a builtin complete backup solution.

So in my conclusion, We'd like to have both APIs if possible.

So if you only need the checkpoint, It'd be OK to leave PR for the checkpoint and close this (We will take over your PR.)