westerndigitalcorporation / zenfs

ZenFS is a storage backend for RocksDB that enables support for ZNS SSDs and SMR HDDs.
GNU General Public License v2.0
235 stars 86 forks source link

IO error: No space left on device: Zone allocation failure during fillseq benchmark with ZenFS #284

Open Zizhao-Wang opened 8 months ago

Zizhao-Wang commented 8 months ago

Issue Description I encountered an issue while conducting a large-scale data write test using ZenFS with RocksDB. The test failed after writing around 200 million entries, with an error message “IO error: No space left on device: Zone allocation failure”.

Environment Setup

Steps to Reproduce

  1. Configured and launched the virtual environment using the following QEMU command:
    qemu-system-x86_64 --enable-kvm
    -name cs-exp-zns
    -m 50G
    -nographic
    -cpu host -smp 16
    -hda ./virtualdisks/ubuntu.qcow2
    -net user,hostfwd=tcp::8081-:22 -net nic
    -drive file=./virtualdisks/zns.raw,id=mynvme,format=raw,if=none
    -device nvme,serial=baz,id=nvme2
    -device nvme-ns,id=ns2,drive=mynvme,nsid=2,logical_block_size=4096,physical_block_size=4096,zoned=true,zoned.zone_size=1024M,zoned.zone_capacity=1000M,zoned.max_open=0,zoned.max_active=0,bus=nvme2
    -drive file=./virtualdisks/nvmessd.raw,id=mynvme2,format=raw,if=none
    -device nvme,serial=foo,id=nvme3
    -device nvme-ns,id=ns3,drive=mynvme2,nsid=3,bus=nvme3
    -fsdev local,id=fsdev0,path=./work/,security_model=none
    -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=hostshare
  2. Ran the following script to perform data writes on ZenFS using RocksDB:
    
    echo deadline > /sys/class/block/nvme0n1/queue/scheduler

../rocksdb/plugin/zenfs/util/zenfs mkfs\ --zbd=nvme0n1 \ --force \ --aux_path=/root/logs

4. Encountered the error after writing around 200 million entries.
```shell
#!/bin/bash

NUM_ENTRIES=500000000                           
VALUE_SIZE=100                                  
COMPRESSION_TYPE="none"                          
WRITE_BUFFER_SIZE=67108864                       
MAX_WRITE_BUFFER_NUMBER=3                       
MIN_WRITE_BUFFER_NUMBER_TO_MERGE=1               
CACHE_SIZE=8388608                               
MAX_BACKGROUND_JOBS=7                            
OPEN_FILES=40000                                 
STATS_PER_INTERVAL=$(($NUM_ENTRIES / 10))        
HISTOGRAM=true                                   
BLOOM_BITS=10                                   
DISABLE_WAL=true                                 

../../rocksdb/db_bench \
    --fs_uri=zenfs://dev:nvme0n1 \
    --use_direct_io_for_flush_and_compaction \
    --benchmarks=fillseq,stats \
    --num="$NUM_ENTRIES" \
    --value_size="$VALUE_SIZE" \
    --compression_type="$COMPRESSION_TYPE" \
    --write_buffer_size="$WRITE_BUFFER_SIZE" \
    --max_write_buffer_number="$MAX_WRITE_BUFFER_NUMBER" \
    --min_write_buffer_number_to_merge="$MIN_WRITE_BUFFER_NUMBER_TO_MERGE" \
    --cache_size="$CACHE_SIZE" \
    --max_background_jobs="$MAX_BACKGROUND_JOBS" \
    --open_files="$OPEN_FILES" \
    --stats_per_interval="$STATS_PER_INTERVAL" \
    --histogram="$HISTOGRAM" \
    --bloom_bits="$BLOOM_BITS" \
    --disable_wal="$DISABLE_WAL" \
    | tee zns_kv_log.log

Expected vs. Actual Results

Expected vs. Actual Results

Additional Information

Any insights or assistance in addressing this issue would be greatly appreciated.

yhr commented 8 months ago

I think the default target file size ended up fragmenting the zones and causing the issue. This can happen with the fillseq workload which skips the normal write flow.

I suggest you set --target_file_size_base=$(( 1000 1024 1024 ))

There is a script in the zenfs tests directory called get_good_db_bench_params_for_zenfs.sh , this will generate a decent set of parameters for your device geometry.