Closed royguo closed 2 years ago
Hi, Thanks for submitting the issue. Could you please provide some more context on how you hit this issue ? Also any specific steps to reproduce this issue, will greatly help.
.db_bench \
--zbd_path=$DEVICE \
--benchmarks=fillrandom \
--use_existing_db=0 \
--histogram=1 \
--statistics=0 \
--stats_per_interval=1 \
--stats_interval_seconds=60 \
--max_background_flushes=3 \
--max_background_compactions=5 \
--enable_lazy_compaction=0 \
--level0_file_num_compaction_trigger=4 \
--sync=1 \
--allow_concurrent_memtable_write=1 \
--bytes_per_sync=32768 \
--wal_bytes_per_sync=32768 \
--delayed_write_rate=419430400 \
--enable_write_thread_adaptive_yield=1 \
--threads=16 \
--num_levels=7 \
--key_size=36 \
--value_size=16000 \
--level_compaction_dynamic_level_bytes=true \
--mmap_read=false \
--compression_type=none \
--memtablerep=skip_list \
--write_buffer_size=268435456 \
--max_write_buffer_number=20 \
--target_file_size_base=134217728 \
--target_blob_file_size=134217728 \
--blob_file_defragment_size=33554432 \
--max_dependence_blob_overlap=128 \
--optimize_filters_for_hits=true \
--optimize_range_deletion=true \
--num=60000000 \
--db=test_kuankuan \
--benchmark_write_rate_limit=100000000 \
--prepare_log_writer_num=0 \
--use_direct_io_for_flush_and_compaction=1
@aravind-wdc
Looking at the debug info, the last block from the first write is over-written. ZenFS does not support overwrites. Upstream rocksdb does not do this, so we'll need to look into what is going on in terarkdb.
I added an assert on the error condition. This is the bactrace from gdb:
thread 1 "db_bench" received signal SIGSEGV, Segmentation fault.
0x0000555555983380 in terarkdb::ZonedWritableFile::PositionedAppend(terarkdb::Slice const&, unsigned long, terarkdb::IOOptions const&, terarkdb::IODebugContext*) ()
(gdb) bt
#0 0x0000555555983380 in terarkdb::ZonedWritableFile::PositionedAppend(terarkdb::Slice const&, unsigned long, terarkdb::IOOptions const&, terarkdb::IODebugContext*) ()
#1 0x000055555594ef33 in terarkdb::ZenfsWritableFile::PositionedAppend(terarkdb::Slice const&, unsigned long) ()
#2 0x00005555558c53fe in terarkdb::WritableFileWriter::WriteDirect() ()
#3 0x00005555558c591f in terarkdb::WritableFileWriter::Flush() ()
#4 0x00005555558c617e in terarkdb::WritableFileWriter::Close() ()
#5 0x000055555598caa7 in terarkdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, terarkdb::VersionSet*, terarkdb::Env*, terarkdb::ImmutableCFOptions const&, terarkdb::MutableCFOptions const&, terarkdb::EnvOptions const&, terarkdb::TableCache*, terarkdb::InternalIteratorBase<terarkdb::LazyBuffer>* (*)(void*, terarkdb::Arena&), void*, std::vector<std::unique_ptr<terarkdb::FragmentedRangeTombstoneIterator, std::default_delete<terarkdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<terarkdb::FragmentedRangeTombstoneIterator, std::default_delete<terarkdb::FragmentedRangeTombstoneIterator> > > > (*)(void*), void*, std::vector<terarkdb::FileMetaData, std::allocator<terarkdb::FileMetaData> >*, terarkdb::InternalKeyComparator const&, std::vector<std::unique_ptr<terarkdb::IntTblPropCollectorFactory, std::default_delete<terarkdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<terarkdb::IntTblPropCollectorFactory, std::default_delete<terarkdb::IntTblPropCollectorFactory> > > > const*, std::vector<std::unique_ptr<terarkdb::IntTblPropCollectorFactory, std::default_delete<terarkdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<terarkdb::IntTblPropCollectorFactory, std::default_delete<terarkdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, terarkdb::SnapshotChecker*, terarkdb::CompressionType, terarkdb::CompressionOptions const&, bool, terarkdb::InternalStats*, terarkdb::TableFileCreationReason, terarkdb:
There are some differences delta upstream rocksdb, and there are some specific changes around direct io that explains the difference in behavior delta upstream rocksdb. See this commit for example: https://github.com/bytedance/terarkdb/commit/512059363607df22b8398bb1788a3f9174c78a05#diff-5a497572c52e60ba25fce7450f621ff517320963fd87ac37d3d85e3a3ee17670
entire history: https://github.com/bytedance/terarkdb/commits/dev.1.4/util/file_reader_writer.cc
Hi, @yhr, Just reviewed the commit you mentioned, didn't see any change that causes the overwrite problem. Will dig into it a little bit more sooner.
@royguo , It looks like terarkdb is missing this patch: https://github.com/facebook/rocksdb/pull/4771/commits/f0e1840d15137e632d9ee99f37394c81b7fa30a5
After applying that, it looks like things are working with --use_direct_io_for_flush_and_compaction in terkarkdb
Error Message:
Debug Info:
The file
000007.sst
has only two operations, the first one appened798720
bytes, but the second start pwrite at offset794624
.Please take a look!