wbolster / plyvel

Plyvel, a fast and feature-rich Python interface to LevelDB
https://plyvel.readthedocs.io/
Other
530 stars 75 forks source link

Copying a leveldb #46

Closed tlevine closed 6 years ago

tlevine commented 8 years ago

It is common that I want to copy a leveldb. I finally wrote a function to do it. Would it make sense to include such a function in plyvel? If yes, I might add it.

The function would look a bit like this, but not as complicated. The second half of this function creates an archive of the leveldb so you can download it in one file, because that's what I wanted in my case; if this is to be included in plyvel, I think the function should return either a new plyvel object or the str name of the directory that contains the new directory.

def archive_level_db(db):
    '''
    Copy a leveldb to a new directory, and put it in a gzipped tarball.

    :param plyvel.DB db: The source database
    :param directory: The directory to put the new database in
    :returns: The gzipped tarball as bytes
    '''
    arcname = datetime.datetime.now().strftime('leveldb-%Y-%m-%d')
    with tempfile.TemporaryDirectory() as directory:
        new_db = plyvel.DB(directory, create_if_missing=True)
        with new_db.write_batch() as b:
            for k,v in db.snapshot().iterator():
                b.put(k, v)
        new_db.close()
        with tempfile.NamedTemporaryFile() as tmp:
            with tarfile.open(tmp.name, "w:gz") as tar:
                tar.add(directory, arcname=arcname)
            tmp.file.seek(0)
            out = tmp.file.read()
    return out

In case I update it, the updated version will probably be in this file.

wbolster commented 8 years ago

i'm not sure this has a place inside plyvel, tbh. i don't really see the use case. also, your function does two things: copying and tarring. taking a snapshot of the current leveldb directory would also work without any special support for it. copying data first is not needed. if you insist, you could compact the db first, but that's not strictly necessary, unless you really want stale values which have not been compacted yet to be explicitly excluded.

another thing: you are using a write batch, which will cause all the data to be buffered into memory, and then "committed" to the db in one "transaction". this means it will use huge amounts of memory for big databases.

tlevine commented 8 years ago

My function indeed tars, and that part would not be included in plyvel, as I remarked earlier. But given your other comments, I think it doesn't belong in plyvel, as it is easy to write and the size and organization of a particular leveldb will affect how you copy it.

A simple example of copying a leveldb could instead go in the documentation.

More interestingly, after proposing this change I was inspired to address outstanding plyvel issues.

On 16 Mar 09:21, Wouter Bolsterlee wrote:

i'm not sure this has a place inside plyvel, tbh. i don't really see the use case. also, your function does two things: copying and tarring. taking a snapshot of the current leveldb directory would also work without any special support for it. copying data first is not needed. if you insist, you could compact the db first, but that's not strictly necessary, unless you really want stale values which have not been compacted yet to be explicitly excluded.

another thing: you are using a write batch, which will cause all the data to be buffered into memory, and then "committed" to the db in one "transaction". this means it will use huge amounts of memory for big databases.


You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/wbolster/plyvel/issues/46#issuecomment-197407382