tikv / titan

A RocksDB plugin for key-value separation, inspired by WiscKey.
https://pingcap.com/blog/titan-storage-engine-design-and-implementation/
Apache License 2.0
485 stars 165 forks source link

Tikv crash in compaction filter casued by IoError: No Such file or directory: While open a file for random read: /data/xxx/yyy/zzz/115748.blob #328

Open mzygQAQ opened 2 weeks ago

mzygQAQ commented 2 weeks ago

This error is different with the existing older Missing blob.

mzygQAQ commented 2 weeks ago
Status BlobStorage::Get(const ReadOptions& options, const BlobIndex& index,
                        BlobRecord* record, PinnableSlice* buffer) {
  auto sfile = FindFile(index.file_number).lock();
  if (!sfile)
    return Status::Corruption("Missing blob file: " +
                              std::to_string(index.file_number));

// NOTE-1: the purge obselete file thread can delete the file in this time, and the next line will report the error

  return file_cache_->Get(options, sfile->file_number(), sfile->file_size(),
                          index.blob_handle, record, buffer);
}
mzygQAQ commented 2 weeks ago

I have checked the code here and there is indeed a race condition present

mzygQAQ commented 1 week ago

@v01dstar Hello, can you help confirm

v01dstar commented 1 week ago

At first glance, seems possible, allow me dig more.

v01dstar commented 1 week ago

I think this is indeed a problem, unless we set skip_value_in_compaction_filter to be true, however, we don't. I am surprise that we don't see this error in our users' environment. If I didn't miss anything, this is more than a race condition. Since compaction filter does not go through the normal read path (i.e. read with a snapshot), this should happen quite frequently.

mzygQAQ commented 1 week ago

I guess that in the TIDB environment, Tikv only uses Compaction Filter in WriteCF, while WriteCF only saves some transaction commit information and small values less than 256 bytes. Moreover, by default, WriteCF does not enable Titan, so it will not occur. This issue occurs in scenarios where Tikv is used with Rawkv or directly with Titan.

v01dstar commented 1 week ago

I guess that in the TIDB environment, Tikv only uses Compaction Filter in WriteCF, while WriteCF only saves some transaction commit information and small values less than 256 bytes. Moreover, by default, WriteCF does not enable Titan, so it will not occur. This issue occurs in scenarios where Tikv is used with Rawkv or directly with Titan.

Yes, I totally missed that. I guess, you can leverate skip_value_in_compaction_filter in this case. Or you can propose a simple fix, which as you suggested, and also mentioned in the TODO, i.e. return corresponding error to the caller of Get(), and the caller (compaction filter) decide what to do.

mzygQAQ commented 1 week ago

i try