google / leveldb

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
BSD 3-Clause "New" or "Revised" License
36.02k stars 7.77k forks source link

Iterator::Seek() gets extremely slow #777

Open ideawu opened 4 years ago

ideawu commented 4 years ago

I am running a leveldb instance with 30GB of data. Now it happens that Iterator::Seek() gets extremely slow, as slow as it takes several hours to finish.

The leveldb.stats is:

                                   Compactions
    Level  Files Size(MB) Time(sec) Read(MB) Write(MB)
    --------------------------------------------------
      0     1946    29505         0        0         0

I debug it by adding some printf() codes, and find out DBIter::FindNextUserEntry() is the problem.

Here is my code added:

printf("%s %d %s %s\n", __FILE__, __LINE__, ikey.DebugString().c_str(), EscapeString(*skip).c_str());

after

https://github.com/google/leveldb/blob/5903e7a1125cacaa1d44367b5b84fe9208e42884/db/db_iter.cc#L188 https://github.com/google/leveldb/blob/5903e7a1125cacaa1d44367b5b84fe9208e42884/db/db_iter.cc#L193

Here is some outputs:

db/db_iter.cc 189 '\x01\x00\x00\x00\x00\x00f\x99@' @ 98350051 : 0 \x01\x00\x00\x00\x00\x00f\x99@
db/db_iter.cc 195 '\x01\x00\x00\x00\x00\x00f\x99@' @ 21077915 : 1 \x01\x00\x00\x00\x00\x00f\x99@
db/db_iter.cc 189 '\x01\x00\x00\x00\x00\x00f\x99A' @ 98350052 : 0 \x01\x00\x00\x00\x00\x00f\x99A
db/db_iter.cc 195 '\x01\x00\x00\x00\x00\x00f\x99A' @ 21077918 : 1 \x01\x00\x00\x00\x00\x00f\x99A
db/db_iter.cc 189 '\x01\x00\x00\x00\x00\x00f\x99B' @ 98350053 : 0 \x01\x00\x00\x00\x00\x00f\x99B
db/db_iter.cc 195 '\x01\x00\x00\x00\x00\x00f\x99B' @ 21077922 : 1 \x01\x00\x00\x00\x00\x00f\x99B
db/db_iter.cc 189 '\x01\x00\x00\x00\x00\x00f\x99C' @ 98350054 : 0 \x01\x00\x00\x00\x00\x00f\x99C
db/db_iter.cc 195 '\x01\x00\x00\x00\x00\x00f\x99C' @ 21077925 : 1 \x01\x00\x00\x00\x00\x00f\x99C
db/db_iter.cc 189 '\x01\x00\x00\x00\x00\x00f\x99D' @ 98350055 : 0 \x01\x00\x00\x00\x00\x00f\x99D
db/db_iter.cc 195 '\x01\x00\x00\x00\x00\x00f\x99D' @ 21077929 : 1 \x01\x00\x00\x00\x00\x00f\x99D

As the output shows, the keys are added and then deleted, but both remain in sst files, so DBIter skips them one by one before reaching at the target. For my use case, I deleted millions of keys, that's why it gets so slow, it iterates over millions of obsoleted keys on a hard disk!

So my question is, why shouldn't these keys be really "deleted"? They do not exist in the database scope, they are marked deleted in level-0 sst file, and they do not exist in higher level sst files, they should be really deleted after compaction.

ideawu commented 4 years ago

Well, there is 1946 files at level=0, that's because the database is just recovered from a repair.

cmumford commented 4 years ago

What version of leveldb are you using?

ideawu commented 4 years ago

@cmumford I am using version 1.20. here is more detail about how to reproduce the issue:

  1. heavy writes and deletes in sliding window way:
    • keys are in the form of sprintf("%15d", end++) and sprintf("%15d", start++)
    • let start=0, end=0, initialize with 10m writes with random values with length of [1k, 2k], to generate 10GB data
    • then loops
      • write n keys, say n=10000
      • delete n keys, n=10000
      • randomly write key between [start, end]
  2. kill -9 $PID occasionally to simulate server down
  3. repair leveldb database, without a force compaction
  4. restart the service(embed leveldb), and restart write/delete immediately
  5. it may happens that there are many level-0 files
  6. now perform a Seek("0") operation, it would be very slow

When performing Seek("0"), leveldb will skips lots of continuous deleted keys in sst file.