westerndigitalcorporation / zenfs

ZenFS is a storage backend for RocksDB that enables support for ZNS SSDs and SMR HDDs.
GNU General Public License v2.0
243 stars 88 forks source link

Segmentation fault using db_bench on zenfs #126

Closed yingjia-wang closed 2 years ago

yingjia-wang commented 2 years ago

Hi,

I was just doing a test: I first set the scheduler to mq-deadline, but after making a zenfs file system, I set the scheduler back to none. I found that when using command: sudo ./db_bench --fs_uri=zenfs://dev:nvme0n1 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction

the result was:

Received signal 11 (Segmentation fault)
#0   ./db_bench(+0x2969cb) [0x55749b1e69cb] ??  ??:0
#1   ./db_bench(+0xb2436) [0x55749b002436] ??   ??:0
#2   /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f292289d0b3] ??        ??:0
#3   ./db_bench(+0xb0e6e) [0x55749b000e6e] ??   ??:0

I think my operation may only cause some kinds of writing error, but I check that write I/O isn't issued. Would anyone tell me the reason?

skyzh commented 2 years ago

Looks like duplicated with https://github.com/westerndigitalcorporation/zenfs/issues/124, would you please check without setting the scheduler forth and back, does this issue still exist?

skyzh commented 2 years ago

...also the backtrace doesn't tell anything. Consider setting DEBUG_LEVEL=1 for RocksDB build?

yingjia-wang commented 2 years ago

Looks like duplicated with #124, would you please check without setting the scheduler forth and back, does this issue still exist?

Hi, @skyzh

If without setting forth and back(just use none for the first time), error occurs when making a zenfs file system, that is: Failed to open zoned block device: nvme0n1, error: Invalid argument: Current ZBD scheduler is not mq-deadline, set it to mq-deadline.

I will open DEBUG mode and provide detailed message, thanks.

skyzh commented 2 years ago

Why not using mq-deadline? This is required for correctness and performance. https://zonedstorage.io/docs/linux/sched/

yingjia-wang commented 2 years ago

Why not using mq-deadline? This is required for correctness and performance. https://zonedstorage.io/docs/linux/sched/

It is just an accidental experiment, but the result is confused... I think write errors rather than segmentation fault may be the result.

damien-lemoal commented 2 years ago

Why not using mq-deadline? This is required for correctness and performance. https://zonedstorage.io/docs/linux/sched/

mq-deadline is indeed required for write correctness on ZNS drive, but absolutely not for performance reasons. Performance is really bad with mq-deadline on SSDs !

yingjia-wang commented 2 years ago

Hi,

I remake RocksDB while setting DEBUG_LEVEL = 1, but the result is as the same as above and I confirm that I/O is stuck... Maybe I think the possibly right result is that I/O is issued but some writes are not correct.

skyzh commented 2 years ago

I suspect that not using mq-deadline would lead to unordered write to one ZNS zone. Could you check if this issue would still exist when scheduler is set to mq-deadline without being changed?

yingjia-wang commented 2 years ago

I suspect that not using mq-deadline would lead to unordered write to one ZNS zone. Could you check if this issue would still exist when scheduler is set to mq-deadline without being changed?

If set to mq-deadline, everything works fine. And I think you are right, not using mq-deadline would lead to unordered write. But no write exists in this occasion, that is confusing... Maybe I guess the interface between rocksdb and zenfs needs to be checked?

skyzh commented 2 years ago

How do you find that there are “no writes”? If you take a look at dmesg, there should be I/O errors.

yingjia-wang commented 2 years ago

How do you find that there are “no writes”? If you take a look at dmesg, there should be I/O errors.

Unfortunately no, can anyone reproduce my operation?

yingjia-wang commented 2 years ago

115 the message is quite similar with this issue, also, @yhr mentions that segfault is not supposed to be there.

yhr commented 2 years ago

@yingjia-git : I think you're right - It might be the same root cause as #115, the zenfs environment does not initialize due to an error (wrong scheduler or malformed uri) and db_bench(or zenfs) does not handle that case gracefully. @aravind-wdc is looking into #115

yingjia-wang commented 2 years ago

Hi @skyzh and @yhr ,

Thanks for your help, I will follow these issues. :)

SheldonZhong commented 2 years ago

If you are using RocksDB later than https://github.com/facebook/rocksdb/commit/1c39b7952bfff1beff1d473444cd75c3313b73bd, this should be fixed by applying the fix in #125.

yingjia-wang commented 2 years ago

Hi @SheldonZhong ,

Thanks for your suggestion, I try your RocksDB version and the up-to-date zenfs. Sadly, even mq-deadline mode will trigger segfault above.

Also, I meet two errors during the make process, so I comment out the corresponding code to make successfully.

  CC       utilities/transactions/lock/range/range_tree/range_tree_lock_tracker.o
plugin/zenfs/fs/fs_zenfs.cc:1147:20: error: ‘std::string rocksdb::GetLogFilename(std::string)’ defined but not used [-Werror=unused-function]
 1147 | static std::string GetLogFilename(std::string bdev) {
      |                    ^~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make: *** [Makefile:2392: plugin/zenfs/fs/fs_zenfs.o] Error 1
make: *** Waiting for unfinished jobs....
zenfs.cc: In function ‘int rocksdb::zenfs_tool_fsinfo()’:
zenfs.cc:560:10: error: ‘class rocksdb::ZenFS’ has no member named ‘ReportSuperblock’
  560 |   zenFS->ReportSuperblock(&superblock_report);
      |          ^~~~~~~~~~~~~~~~
make: *** [Makefile:17: zenfs] Error 1
aravind-wdc commented 2 years ago

CC utilities/transactions/lock/range/range_tree/range_tree_lock_tracker.o plugin/zenfs/fs/fs_zenfs.cc:1147:20: error: ‘std::string rocksdb::GetLogFilename(std::string)’ defined but not used [-Werror=unused-function] 1147 | static std::string GetLogFilename(std::string bdev) { | ^~~~~~ cc1plus: all warnings being treated as errors make: [Makefile:2392: plugin/zenfs/fs/fs_zenfs.o] Error 1 make: Waiting for unfinished jobs....

You would see this if you were compiling in Debug_level > 0. But now, this is fixed in ZenFS by PR #128. With latest master this should not happen.

yingjia-wang commented 2 years ago

CC utilities/transactions/lock/range/range_tree/range_tree_lock_tracker.o plugin/zenfs/fs/fs_zenfs.cc:1147:20: error: ‘std::string rocksdb::GetLogFilename(std::string)’ defined but not used [-Werror=unused-function] 1147 | static std::string GetLogFilename(std::string bdev) { | ^~~~~~ cc1plus: all warnings being treated as errors make: [Makefile:2392: plugin/zenfs/fs/fs_zenfs.o] Error 1 make: Waiting for unfinished jobs....

You would see this if you were compiling in Debug_level > 0. But now, this is fixed in ZenFS by PR #128. With latest master this should not happen.

Yes, thanks. By the way, is there any progress in the segmentation fault error on the top? :)

aravind-wdc commented 2 years ago

@yingjia-git I was able to root cause this issue(with a few extra prints). After setting the scheduler to "none", when you start db_bench, it basically gets the error saying that the scheduler is not "mq-deadline" and errors out but this error is not propagated to upper layers causing the segmentation fault.

$ sudo ./db_bench --fs_uri=zenfs://dev:nvme0n1 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction
error msg = IO error: Invalid argument: Current ZBD scheduler is not mq-deadline, set it to mq-deadline. 
Received signal 11 (Segmentation fault)
#0   ./db_bench(+0x26f9e6) [0x55f2ca1589e6] ??  ??:0
#1   ./db_bench(+0xaff75) [0x55f2c9f98f75] ??   ??:0
#2   /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5) [0x7f1389a6f565] ??        ??:0
#3   ./db_bench(+0xae20e) [0x55f2c9f9720e] ??   ??:0
Segmentation fault

This is a bug with how rocksdb is propagating the errors to upper layers, it is basically same issue as https://github.com/facebook/rocksdb/issues/9365 They have a probable fix here: https://github.com/facebook/rocksdb/pull/9333

Above PR 9333 should fix this issue also.

yingjia-wang commented 2 years ago

@yingjia-git I was able to root cause this issue(with a few extra prints). After setting the scheduler to "none", when you start db_bench, it basically gets the error saying that the scheduler is not "mq-deadline" and errors out but this error is not propagated to upper layers causing the segmentation fault.

$ sudo ./db_bench --fs_uri=zenfs://dev:nvme0n1 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction
error msg = IO error: Invalid argument: Current ZBD scheduler is not mq-deadline, set it to mq-deadline. 
Received signal 11 (Segmentation fault)
#0   ./db_bench(+0x26f9e6) [0x55f2ca1589e6] ??  ??:0
#1   ./db_bench(+0xaff75) [0x55f2c9f98f75] ??   ??:0
#2   /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5) [0x7f1389a6f565] ??        ??:0
#3   ./db_bench(+0xae20e) [0x55f2c9f9720e] ??   ??:0
Segmentation fault

This is a bug with how rocksdb is propagating the errors to upper layers, it is basically same issue as facebook/rocksdb#9365 They have a probable fix here: facebook/rocksdb#9333

Above PR 9333 should fix this issue also.

Oh yes, @aravind-wdc , thanks for your analysis, I understand it now.