Closed yingjia-wang closed 2 years ago
Looks like duplicated with https://github.com/westerndigitalcorporation/zenfs/issues/124, would you please check without setting the scheduler forth and back, does this issue still exist?
...also the backtrace doesn't tell anything. Consider setting DEBUG_LEVEL=1
for RocksDB build?
Looks like duplicated with #124, would you please check without setting the scheduler forth and back, does this issue still exist?
Hi, @skyzh
If without setting forth and back(just use none for the first time), error occurs when making a zenfs file system, that is:
Failed to open zoned block device: nvme0n1, error: Invalid argument: Current ZBD scheduler is not mq-deadline, set it to mq-deadline.
I will open DEBUG mode and provide detailed message, thanks.
Why not using mq-deadline
? This is required for correctness and performance. https://zonedstorage.io/docs/linux/sched/
Why not using
mq-deadline
? This is required for correctness and performance. https://zonedstorage.io/docs/linux/sched/
It is just an accidental experiment, but the result is confused... I think write errors rather than segmentation fault may be the result.
Why not using
mq-deadline
? This is required for correctness and performance. https://zonedstorage.io/docs/linux/sched/
mq-deadline is indeed required for write correctness on ZNS drive, but absolutely not for performance reasons. Performance is really bad with mq-deadline on SSDs !
Hi,
I remake RocksDB while setting DEBUG_LEVEL = 1, but the result is as the same as above and I confirm that I/O is stuck... Maybe I think the possibly right result is that I/O is issued but some writes are not correct.
I suspect that not using mq-deadline would lead to unordered write to one ZNS zone. Could you check if this issue would still exist when scheduler is set to mq-deadline without being changed?
I suspect that not using mq-deadline would lead to unordered write to one ZNS zone. Could you check if this issue would still exist when scheduler is set to mq-deadline without being changed?
If set to mq-deadline, everything works fine. And I think you are right, not using mq-deadline would lead to unordered write. But no write exists in this occasion, that is confusing... Maybe I guess the interface between rocksdb and zenfs needs to be checked?
How do you find that there are “no writes”? If you take a look at dmesg, there should be I/O errors.
How do you find that there are “no writes”? If you take a look at dmesg, there should be I/O errors.
Unfortunately no, can anyone reproduce my operation?
@yingjia-git : I think you're right - It might be the same root cause as #115, the zenfs environment does not initialize due to an error (wrong scheduler or malformed uri) and db_bench(or zenfs) does not handle that case gracefully. @aravind-wdc is looking into #115
Hi @skyzh and @yhr ,
Thanks for your help, I will follow these issues. :)
If you are using RocksDB later than https://github.com/facebook/rocksdb/commit/1c39b7952bfff1beff1d473444cd75c3313b73bd, this should be fixed by applying the fix in #125.
Hi @SheldonZhong ,
Thanks for your suggestion, I try your RocksDB version and the up-to-date zenfs. Sadly, even mq-deadline mode will trigger segfault above.
Also, I meet two errors during the make process, so I comment out the corresponding code to make successfully.
CC utilities/transactions/lock/range/range_tree/range_tree_lock_tracker.o
plugin/zenfs/fs/fs_zenfs.cc:1147:20: error: ‘std::string rocksdb::GetLogFilename(std::string)’ defined but not used [-Werror=unused-function]
1147 | static std::string GetLogFilename(std::string bdev) {
| ^~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make: *** [Makefile:2392: plugin/zenfs/fs/fs_zenfs.o] Error 1
make: *** Waiting for unfinished jobs....
zenfs.cc: In function ‘int rocksdb::zenfs_tool_fsinfo()’:
zenfs.cc:560:10: error: ‘class rocksdb::ZenFS’ has no member named ‘ReportSuperblock’
560 | zenFS->ReportSuperblock(&superblock_report);
| ^~~~~~~~~~~~~~~~
make: *** [Makefile:17: zenfs] Error 1
CC utilities/transactions/lock/range/range_tree/range_tree_lock_tracker.o plugin/zenfs/fs/fs_zenfs.cc:1147:20: error: ‘std::string rocksdb::GetLogFilename(std::string)’ defined but not used [-Werror=unused-function] 1147 | static std::string GetLogFilename(std::string bdev) { | ^
~~~~~ cc1plus: all warnings being treated as errors make: [Makefile:2392: plugin/zenfs/fs/fs_zenfs.o] Error 1 make: Waiting for unfinished jobs....
You would see this if you were compiling in Debug_level > 0. But now, this is fixed in ZenFS by PR #128. With latest master this should not happen.
CC utilities/transactions/lock/range/range_tree/range_tree_lock_tracker.o plugin/zenfs/fs/fs_zenfs.cc:1147:20: error: ‘std::string rocksdb::GetLogFilename(std::string)’ defined but not used [-Werror=unused-function] 1147 | static std::string GetLogFilename(std::string bdev) { | ^
~~~~~ cc1plus: all warnings being treated as errors make: [Makefile:2392: plugin/zenfs/fs/fs_zenfs.o] Error 1 make: Waiting for unfinished jobs....You would see this if you were compiling in Debug_level > 0. But now, this is fixed in ZenFS by PR #128. With latest master this should not happen.
Yes, thanks. By the way, is there any progress in the segmentation fault error on the top? :)
@yingjia-git I was able to root cause this issue(with a few extra prints). After setting the scheduler to "none", when you start db_bench, it basically gets the error saying that the scheduler is not "mq-deadline" and errors out but this error is not propagated to upper layers causing the segmentation fault.
$ sudo ./db_bench --fs_uri=zenfs://dev:nvme0n1 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction
error msg = IO error: Invalid argument: Current ZBD scheduler is not mq-deadline, set it to mq-deadline.
Received signal 11 (Segmentation fault)
#0 ./db_bench(+0x26f9e6) [0x55f2ca1589e6] ?? ??:0
#1 ./db_bench(+0xaff75) [0x55f2c9f98f75] ?? ??:0
#2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5) [0x7f1389a6f565] ?? ??:0
#3 ./db_bench(+0xae20e) [0x55f2c9f9720e] ?? ??:0
Segmentation fault
This is a bug with how rocksdb is propagating the errors to upper layers, it is basically same issue as https://github.com/facebook/rocksdb/issues/9365 They have a probable fix here: https://github.com/facebook/rocksdb/pull/9333
Above PR 9333 should fix this issue also.
@yingjia-git I was able to root cause this issue(with a few extra prints). After setting the scheduler to "none", when you start db_bench, it basically gets the error saying that the scheduler is not "mq-deadline" and errors out but this error is not propagated to upper layers causing the segmentation fault.
$ sudo ./db_bench --fs_uri=zenfs://dev:nvme0n1 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction error msg = IO error: Invalid argument: Current ZBD scheduler is not mq-deadline, set it to mq-deadline. Received signal 11 (Segmentation fault) #0 ./db_bench(+0x26f9e6) [0x55f2ca1589e6] ?? ??:0 #1 ./db_bench(+0xaff75) [0x55f2c9f98f75] ?? ??:0 #2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5) [0x7f1389a6f565] ?? ??:0 #3 ./db_bench(+0xae20e) [0x55f2c9f9720e] ?? ??:0 Segmentation fault
This is a bug with how rocksdb is propagating the errors to upper layers, it is basically same issue as facebook/rocksdb#9365 They have a probable fix here: facebook/rocksdb#9333
Above PR 9333 should fix this issue also.
Oh yes, @aravind-wdc , thanks for your analysis, I understand it now.
Hi,
I was just doing a test: I first set the scheduler to mq-deadline, but after making a zenfs file system, I set the scheduler back to none. I found that when using command:
sudo ./db_bench --fs_uri=zenfs://dev:nvme0n1 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction
the result was:
I think my operation may only cause some kinds of writing error, but I check that write I/O isn't issued. Would anyone tell me the reason?