westerndigitalcorporation / zenfs

ZenFS is a storage backend for RocksDB that enables support for ZNS SSDs and SMR HDDs.
GNU General Public License v2.0
238 stars 87 forks source link

How to test the active zone limit? #216

Closed UNKJay closed 2 years ago

UNKJay commented 2 years ago

I've read . It said "RocksDB can work with as few as 6 active zones with restricted write performance, while more than 12 active zones does not add any significant performance benefits.", but I can't get the same result in my emulator, can I get some help in db_bench script?

aravind-wdc commented 2 years ago

@UNKJay Can you please share more details on your test ? Like How are you emulating a ZNS device ? What are the discrepancies that you are seeing ? The point in the paper was due to the fact that physical ZNS SSDs have a limit on the max active number of zones in any given moment. So writing up to 12 zones in parallel was able to meet the needs of rocksdb writes and opening further zones in parallel was not needed. That said, max_background_jobs parameter should help you to control/tune the active zones used by rocksdb. Apart from that can you please specify what help you need ?

UNKJay commented 2 years ago

@aravind-wdc I use FEMU to emulate a ZNS SSD. I've found FEMU will receive a few write request with the same logical address, which means in-place update? Will ZenFS cause in-place update?

aravind-wdc commented 2 years ago

@UNKJay Thanks for the update. ZenFS always writes at write pointer, so I don't think zenfs is doing in-place updates. It could be out of order writes as well reaching the drive. Have you checked if the scheduler is mq-deadline ? Schedulers other than mq-deadline can cause out of order writes to happen.

UNKJay commented 2 years ago

@aravind-wdc Thanks for the answer. The scheduler is mq-deadline and I have found a problem. Everytime the journal zone persisted, it's a small-scale write, however, the smallest writing scale in SSD is about 4K (page size), so this is a dismatch. Do you have some suggestions to fix the problem?

aravind-wdc commented 2 years ago

@UNKJay writes have to be block size aligned. iirc, for buffered writes (for write ahead log) zenfs pads the write buffer to match block size. Are you running a modified(in code) zenfs ? I am not sure what the problem exactly is

UNKJay commented 2 years ago

@aravind-wdc Can I configure the buffer write aligned to more larger size?

yhr commented 2 years ago

@UNKJay: Are you emulating a device with a > 4k block size?

UNKJay commented 2 years ago

@yhr No, I've change the LBA to 4K to align the flash page size, but I still can't find the difference between different active zone limit. How can I get the frequency of compaction? I think maybe the testcase cannot trigger enough compaction operations.

yhr commented 2 years ago

@UNKJay : The rocksdb LOG file periodically outputs the compaction statistics, check that. If no compaction has been done, make sure you are running large enough tests and that the workload contains overwrites.The log file is stored in the aux directory specified during zenfs mkfs.

yhr commented 2 years ago

@UNKJay : Did my answer make sense? Can we close this issue?