westerndigitalcorporation / zenfs

ZenFS is a storage backend for RocksDB that enables support for ZNS SSDs and SMR HDDs.
GNU General Public License v2.0
239 stars 87 forks source link

Unspected space amplification on rocksdb with zenfs #65

Closed BaronStack closed 2 years ago

BaronStack commented 2 years ago

TEST

DISK: ZNS * 2T Rocksdb: 6.25 Zenfs: master max_active_zones: 14 max_open_zones: 14 chunk_sectors: 4194304 chunk_sectors-size: 2M

Problem

Two much amplification after fillrandom and overwrite.

test_bench.sh

DEV=$1
 ./db_bench \
     --fs_uri=zenfs://dev:$DEV \
     --benchmarks=fillrandom,stats \
     -statistics \
     --num=3000000000 \
     --threads=32 \
     --db=./db \
     --wal_dir=./db \
     -report_interval_seconds=1 \
     -stats_dump_period_sec=5 \
     --duration=3000 \
     --key_size=16 \
     --value_size=128 \
     --max_write_buffer_number=16 \
     -max_background_compactions=32 \
     -max_background_flushes=16 \
     -subcompactions=8 \
     -compression_type=none \

 cp zns-log/db/LOG test-log/fillrandom.log
 sleep 10

./db_bench \
     --fs_uri=zenfs://dev:$DEV \
     --benchmarks=overwrite,stats \
     -statistics \
     --num=3000000000 \
     --threads=32 \
     --db=./db \
     --wal_dir=./db \
     -report_interval_seconds=1 \
     -stats_dump_period_sec=5 \
     --duration=1800 \
     --key_size=16 \
     --value_size=128 \
     --use_existing_db=1 \
     --use_existing_keys=1 \
     --max_write_buffer_number=16 \
     -max_background_compactions=32 \
     -max_background_flushes=16 \
     -subcompactions=8 \
     -compression_type=none \

After finished:

# ./plugin/zenfs/util/zenfs df --zbd=nvme2n1
Free: 140293 MB
Used: 54189 MB
Reclaimable: 776863 MB
Space amplification: 1433%

We can see that there ara more than 770G space need reclaim, and actual space occupy is just 54G.

Any suggestion to solve the problem?

skyzh commented 2 years ago

There is a dump tool which could help analyze the on-disk layout of your RocksDB instance. You may have a try and upload logs. Meanwhile, would you please take a look at RocksDB log, better to upload it? Is there any warning that caused such space amplification?

aravind-wdc commented 2 years ago

@BaronStack Could you give it a try with target_file_size_base with a value of 1GB (--target_file_size_base=1073741824) in dbbench arguments. typically default target_file_size_base will be 64MB, which can increase the space amp.

BaronStack commented 2 years ago

@aravind-wdc Thanks for your advice. The space is normal for now after change the targe_file_size/write_buffer_size.

# ./plugin/zenfs/util/zenfs df --zbd=nvme2n1
Free: 927065 MB
Used: 31093 MB
Reclaimable: 7975 MB
Space amplification: 25%

But it's hard to understand the problem that every zone's space just store a small file ,then the remaining_capacity has been set 0, but actually there is 90% space is free.

aravind-wdc commented 2 years ago

@BaronStack There are some optimisations happening around the allocation logic in future releases which should address this.

BaronStack commented 2 years ago

@aravind-wdc Is that the issue #36 doing try to optimize the allocation algorithm?

skyzh commented 2 years ago

But it's hard to understand the problem that every zone's space just store a small file ,then the remaining_capacity has been set 0, but actually there is 90% space is free.

That is the hardware constraint of ZNS. For example, on a zone, we created 1GB of files, but deleted 0.9GB of them. We can not immediately reuse the freed 0.9GB of space, as we need to reset the whole zone before using it again. That's what remaining_capacity means.

yhr commented 2 years ago

@BaronStack : #36 aims to reduce the allocation latency, providing better write quality of service.

I plan to explore alternative alternative allocation algorithms in the future (the one we are using right now seems good enough, but we could do perhaps do better). Ideas and testing are very welcome!

Adding optional garbage collection would also be interesting to try out(see #13) - for use cases that space utilization needs to be maximized.

yhr commented 2 years ago

@BaronStack : Can we close this issue now?

BaronStack commented 2 years ago

@BaronStack : Can we close this issue now?

Of course! Thanks for all of your reply!