google / leveldb

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
BSD 3-Clause "New" or "Revised" License
36.6k stars 7.83k forks source link

Database size excessive variation #1102

Closed JackPiri closed 1 year ago

JackPiri commented 1 year ago

I'm playing with some test code and I've come out with an unexpected behavior regarding db size.

In my test program, I make separated runs, in which:

  1. create database, fill it with some test records, then close
  2. open database, then close
  3. open database, iterate over records, then close
  4. open database, then close

as you can see, bullets 2, 3 and 4 involve no write operations, but nonetheless I see significant variations in db size (meaning size of its folder); in particular (at the end of each run):

  1. 45.4 MB
  2. 26.1 MB
  3. 38.0 MB
  4. 26.1 MB

I understand there are some caching feature (write_buffer_size and *.log file) but this seems too much to me... Can someone explain if I am doing something wrong or if this is totally expected? Further, since the data in db is the same at the end of each run, is there a way to explicitly require a cleaning before closure, in order to get minimal size in the end (like in cases 2 and 4)?

Excerpt of test code:

// int run; // value set through command line arguments

if (run == 1)
{
    leveldb::DB* mydb;
    leveldb::Options mydb_options;
    mydb_options.compression = leveldb::CompressionType::kNoCompression;
    mydb_options.create_if_missing = true;

    // open
    leveldb::Status status = leveldb::DB::Open(mydb_options, "/tmp/mydb", &mydb);
    if (!status.ok()) cerr << status.ToString() << endl;
    assert(status.ok());

    // put some data
    for (int i = 0; i < 1000000; i++)
    {
        const std::string key = "mykey" + std::to_string(i), value = "myvalue" + std::to_string(i);
        mydb->Put(leveldb::WriteOptions(), key, value);
    }

    // close
    delete mydb;
}
else if (run == 2 || run == 4)
{
    leveldb::DB* mydb;
    leveldb::Options mydb_options;
    mydb_options.compression = leveldb::CompressionType::kNoCompression;
    mydb_options.create_if_missing = false;

    // open
    leveldb::Status status = leveldb::DB::Open(mydb_options, "/tmp/mydb", &mydb);
    if (!status.ok()) cerr << status.ToString() << endl;
    assert(status.ok());

    // close
    delete mydb;
}
else if (run == 3)
{
    leveldb::DB* mydb;
    leveldb::Options mydb_options;
    mydb_options.compression = leveldb::CompressionType::kNoCompression;
    mydb_options.create_if_missing = false;

    // open
    leveldb::Status status = leveldb::DB::Open(mydb_options, "/tmp/mydb", &mydb);
    if (!status.ok()) cerr << status.ToString() << endl;
    assert(status.ok());

    // iterate
    leveldb::Iterator* it = mydb->NewIterator(leveldb::ReadOptions());
    for (it->SeekToFirst(); it->Valid(); it->Next())
    {
    }
    if (!it->status().ok()) cerr << it->status().ToString() << endl;
    assert(it->status().ok());
    delete it;

    // close
    delete mydb;
}

Sidenote: I am using quite old version (1.18) but maybe the same issue/question applies to more recent versions.

Thanks

JackPiri commented 1 year ago

Closing since opened for 8 months without any reply...is this a dead unmaintained project??