Pure python binding and recover_index_from_blob.py fixes

SaveTheRbtz commented 11 years ago

recover_index_from_blob.py:

Fixed it in various places now it manages to recover index from data file. Not fast though;
Allow bases to be passed through command line;
Improvements to logging.

blob.py:

Removed check for data header vs index header equality;
Fixed infinite loop on iteration over broken bases;
Fixed iteration over removed entries;
Removed commented code;
Reformatted code and added comments;

PS. recover_index_from_blob.py was broken for about a year. PPS. blob.py is way too "hacky" and probably should not be used in new scripts.

bioothod commented 11 years ago

What about removing eblob.py and forcing eblob recovery script to use boost::python iterators? What about removing recover_index_from_blob.py altogether, is it still relevant?

SaveTheRbtz commented 11 years ago

In my opinion recovery script should not use internal eblob iterators but operate on raw structures for robustness reasons. Also recovery script should be written in C/C++ for performance reasons.

For now I guess we should just merge that to have working solution for removed indexes problem. For future versions we should think of some kind of eblob_fsck utility that would preform various checks on eblob database invariants like:

Check that there are no entries in index that fall outside of data file;
Check that headers in index and data file are in sync;
Check for duplicated non-removed entries;
Various sanity checks for flags and record sizes;
etc;

and will be able to recover from blob corruptions including full index regeneration from data file / unsorted index.

PS. BTW what happened to http://doc.reverbrain.com?

bioothod commented 11 years ago

Hosting provider issues...

How ironically - we build very robust fault tolerant distributed storage systems and simultaneously suffer from web hosting issues for the site

reverbrain / eblob

Pure python binding and recover_index_from_blob.py fixes #36