westerndigitalcorporation / zenfs

ZenFS is a storage backend for RocksDB that enables support for ZNS SSDs and SMR HDDs.
GNU General Public License v2.0
235 stars 86 forks source link

zonefs report put error and cannot use df #261

Closed ZhuWeiLin0 closed 1 year ago

ZhuWeiLin0 commented 1 year ago

Hi, i'm using zenfs+zonefs but i encountered some problems.

here is how I format my # DEBUG_LEVEL=0 ROCKSDB_PLUGINS=zenfs make -j48 db_bench install # cd plugin/zenfs/util # make # echo deadline > /sys/class/block/nullb0/queue/scheduler # mkzonefs /dev/nullb0 # mount -o explicit-open /dev/nullb0 /mnt/zonefs # ./plugin/zenfs/util/zenfs mkfs --zonefs=/mnt/zonefs --aux_path=/home/test # ./db_bench --fs_uri=zenfs://zonefs: /mnt/zonefs --benchmarks="fillrandom,stats,sstables,levelstats" -num=10000000 --value_size=200 --key_size=16 --use_direct_io_for_flush_and_compaction --compression_type=none

after writing some kv pairs, rocksdb would abort and report: .... Perf Level: 1

Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags Integrated BlobDB: blob cache disabled DB path: [rocksdbtest/dbbench] put error: Corruption: block checksum mismatch: stored = 3097629399, computed = 4003023917, type = 4 in rocksdbtest/dbbench/000072.sst offset 0 size 4006

moreover, when I try to print the space stats of nullb0 using: # ./plugin/zenfs/util/zenfs df--zonefs=/dev/nullb0 it reports: Failed to open zoned block device: , error: Invalid argument: Failed to access zonefs sequential zone directory: Not a directory

why would this happen? thanks!!

aravind-wdc commented 1 year ago

echo deadline > /sys/class/block/nullb0/queue/scheduler

Are you sure nullb0 is the right device ? Did you create a zoned null_block device ?

ZhuWeiLin0 commented 1 year ago

it's weird. the abort always happens when 5200000 entries are written. if -num is set to 5000000, db_bench runs successfully. but still cannot use - df.

ZhuWeiLin0 commented 1 year ago

echo deadline > /sys/class/block/nullb0/queue/scheduler

Are you sure nullb0 is the right device ? Did you create a zoned null_block device ?

yes, it's an emulated zoned null_block device. # cat /sys/block/nullb0/queue/zoned host-managed

damien-lemoal commented 1 year ago

What kernel version are you using ? And if you run df right after formatting and mounting zonefs, does it work ?

damien-lemoal commented 1 year ago

Have you checked dmesg output to see if there are any error message from zonefs ?

ZhuWeiLin0 commented 1 year ago

What kernel version are you using ? And if you run df right after formatting and mounting zonefs, does it work ? Have you checked dmesg output to see if there are any error message from zonefs ?

I'm using 5.15.0-60 kernel. dmesg output is : [1654235.492489] zonefs (nullb0): Mounting 55 zones [1654235.492496] zonefs (nullb0): No open zones limit. Ignoring explicit_open mount option [1654235.492526] zonefs (nullb0): Zone group "cnv" has 4 files [1654235.492635] zonefs (nullb0): Zone group "seq" has 50 files [1662211.493181] zonefs (nullb0): Mounting 55 zones [1662211.493187] zonefs (nullb0): No open zones limit. Ignoring explicit_open mount option [1662211.493207] zonefs (nullb0): Zone group "cnv" has 4 files [1662211.493336] zonefs (nullb0): Zone group "seq" has 50 files

it says: Ignoring explicit_open mount option. May this be the reason?

damien-lemoal commented 1 year ago

5.15.60 is outdated. Current rev is 5.15.98. There were some zonefs bug fixes backported, so you should update your kernel, or use a more recent kernel version (6.2 is out...).

You setup your nullblk drive without any active zone limit, so the option explicit_open has no effect whatsoever, hence the message you see.

Does df work after mounting zonefs and before running zenfs ?

ZhuWeiLin0 commented 1 year ago

5.15.60 is outdated. Current rev is 5.15.98. There were some zonefs bug fixes backported, so you should update your kernel, or use a more recent kernel version (6.2 is out...).

You setup your nullblk drive without any active zone limit, so the option explicit_open has no effect whatsoever, hence the message you see.

Does df work after mounting zonefs and before running zenfs ?

yes, df works without zenfs

# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/nullb0 113246208 8388608 104857600 8% /mnt

ZhuWeiLin0 commented 1 year ago

5.15.60 is outdated. Current rev is 5.15.98. There were some zonefs bug fixes backported, so you should update your kernel, or use a more recent kernel version (6.2 is out...).

You setup your nullblk drive without any active zone limit, so the option explicit_open has no effect whatsoever, hence the message you see.

Does df work after mounting zonefs and before running zenfs ?

or do you mean using df under ./plugin/zenfs/util/zenfs ? # ./plugin/zenfs/util/zenfs df --zonefs=/dev/nullb0 Failed to open zoned block device: , error: Invalid argument: Failed to access zonefs sequential zone directory: Not a directory it still does not work

yhr commented 1 year ago

@ZhuWeiLin0 : you need to specify the zonefs mount point when using df, not the block device: --zonefs=/mnt/zonefs

ZhuWeiLin0 commented 1 year ago

@yhr So that's it ! Thanks!!

@ZhuWeiLin0 : you need to specify the zonefs mount point when using df, not the block device: --zonefs=/mnt/zonefs

ZhuWeiLin0 commented 1 year ago

But i still don't know why rocksdb would report compaction corruption. Any help is deeply thanked!

yhr commented 1 year ago

@ZhuWeiLin0 : is the workload completing ok when not using zonefs? (using the raw block device in stead)

ZhuWeiLin0 commented 1 year ago

@yhr yes, everything works fine if using zenfs only.

yhr commented 1 year ago

@ZhuWeiLin0 : there is a few bugfixes available on the 5.15 stable kernel branch that you do not have in your 5.15.60 kernel build.

git log v5.15.60..stable/linux-5.15.y --oneline fs/zonefs/

350d66d9e730 zonefs: Detect append writes at invalid locations
e85bdc78720c zonefs: fix zone report size in __zonefs_io_error()
dd2ee2fd1fcb block: add a bdev_max_zone_append_sectors helper

Probably worth trying with the latest updated 5.15 kernel (or switch to 6.2)

That said, i'd like to see if we could reproduce the issue. Could you share your exact nullblk setup? (command line for creating the device)

ZhuWeiLin0 commented 1 year ago

@ZhuWeiLin0 : there is a few bugfixes available on the 5.15 stable kernel branch that you do not have in your 5.15.60 kernel build.

git log v5.15.60..stable/linux-5.15.y --oneline fs/zonefs/

350d66d9e730 zonefs: Detect append writes at invalid locations
e85bdc78720c zonefs: fix zone report size in __zonefs_io_error()
dd2ee2fd1fcb block: add a bdev_max_zone_append_sectors helper

Probably worth trying with the latest updated 5.15 kernel (or switch to 6.2)

That said, i'd like to see if we could reproduce the issue. Could you share your exact nullblk setup? (command line for creating the device)

@yhr I'll try to update the kernel and see if this problem rise again. Besides, my nullblk setup is: ./nullblk.sh 512 2048 5 50

nullblk.sh:

!/bin/bash

if [ $# != 4 ]; then echo "Usage: $0 <sect size (B)> <zone size (MB)> " exit 1 fi

scriptdir=$(cd $(dirname "$0") && pwd)

modprobe null_blk nr_devices=0 || return $?

function create_zoned_nullb() { local nid=0 local bs=$1 local zs=$2 local nr_conv=$3 local nr_seq=$4

    cap=$(( zs * (nr_conv + nr_seq) ))

    while [ 1 ]; do
            if [ ! -b "/dev/nullb$nid" ]; then
                    break
            fi
            nid=$(( nid + 1 ))
    done

    dev="/sys/kernel/config/nullb/nullb$nid"
    mkdir "$dev"

    echo $bs > "$dev"/blocksize
    echo 0 > "$dev"/completion_nsec
    echo 0 > "$dev"/irqmode
    echo 2 > "$dev"/queue_mode
    echo 1024 > "$dev"/hw_queue_depth
    echo 1 > "$dev"/memory_backed
    echo 1 > "$dev"/zoned

    echo $cap > "$dev"/size
    echo $zs > "$dev"/zone_size
    echo $nr_conv > "$dev"/zone_nr_conv

    echo 1 > "$dev"/power

    echo mq-deadline > /sys/block/nullb$nid/queue/scheduler

    echo "$nid"

}

nulldev=$(create_zoned_nullb $1 $2 $3 $4) echo "Created /dev/nullb$nulldev"

yhr commented 1 year ago

@damien-lemoal @ZhuWeiLin0 : i can recreate the issue on the latest stable kernel (6.2.7)

So it looks like there is a bug out there. Either in the zonefs backend in zenfs or in zonefs.

yhr commented 1 year ago

If the nullblk deb is set up with a 4k block size, the benchmark completes. If direct IO is used(--use_direct_io_for_flush_and_compaction --use_direct_reads) , the benchmark completes with a 512b block size.

damien-lemoal commented 1 year ago

What is the sequence of commands/actions leading to the problem ? Because I cannot see any issue with df bash command nor statfs() in my local tests.

yhr commented 1 year ago

What is the sequence of commands/actions leading to the problem ? Because I cannot see any issue with df bash command nor statfs() in my local tests.

It's the db_bench workload that fails after detecting a data corruption, probably during compaction.

./nullblk.sh 512 2048 5 50
mkzonefs /dev/nullb0
mount -o explicit-open /dev/nullb0 /mnt/zonefs
./plugin/zenfs/util/zenfs mkfs --zonefs=/mnt/zonefs --aux_path=/home/test
./db_bench --fs_uri=zenfs://zonefs: /mnt/zonefs --benchmarks="fillrandom,stats,sstables,levelstats" -num=10000000 --value_size=200 --key_size=16 --use_direct_io_for_flush_and_compaction --compression_type=none

this is nullblk.sh:

#!/bin/bash

if [ $# != 4 ]; then
echo "Usage: $0 <sect size (B)> <zone size (MB)> "
exit 1
fi

scriptdir=$(cd $(dirname "$0") && pwd)

modprobe null_blk nr_devices=0 || return $?

function create_zoned_nullb()
{
local nid=0
local bs=$1
local zs=$2
local nr_conv=$3
local nr_seq=$4

    cap=$(( zs * (nr_conv + nr_seq) ))

    while [ 1 ]; do
            if [ ! -b "/dev/nullb$nid" ]; then
                    break
            fi
            nid=$(( nid + 1 ))
    done

    dev="/sys/kernel/config/nullb/nullb$nid"
    mkdir "$dev"

    echo $bs > "$dev"/blocksize
    echo 0 > "$dev"/completion_nsec
    echo 0 > "$dev"/irqmode
    echo 2 > "$dev"/queue_mode
    echo 1024 > "$dev"/hw_queue_depth
    echo 1 > "$dev"/memory_backed
    echo 1 > "$dev"/zoned

    echo $cap > "$dev"/size
    echo $zs > "$dev"/zone_size
    echo $nr_conv > "$dev"/zone_nr_conv

    echo 1 > "$dev"/power

    echo mq-deadline > /sys/block/nullb$nid/queue/scheduler

    echo "$nid"
}
yhr commented 1 year ago

@ZhuWeiLin0 : this turned out to be a kernel issue in zonefs, fixed now upstream. It is available in 5.15.109 for example. Update your kernel and you should not see the corruption.