Open luckyluc74 opened 3 months ago
@scineram: Care to elaborate on that 👎?
Usually if checksum errors and no IO errors are coming up, my default is to distrust the USB enclosure, because I have seen a number of USB setups that silently masked IO errors from disks, and then you only find that out when you scrub later and find checksum errors.
I've not seen this kind of failure on my Pi 5, but I'll try to make time later to experiment with it some more.
Oh, yeah.
I'd also suggest testing with vanilla 2.2.4, because Ubuntu has a history of occasionally shipping data corruption patches that they made themselves.
@rincebrain thanks for your reply.
Some extra info, I used both usb3 enclosure as described in opening post with ubuntu 23.10 and 2 month ago I changed to NVME base bottom hat of Pimoroni.
Both usb3 enclosure and NVME base bottom hat of Pimoroni with same NVME drive did not show any error with ZFS ubuntu 23.10. But with Ubuntu 24.04 I could reproduce the errors with ZFS quite easily with same hardware which was running fine with ubuntu 23.10. So that should rule out any errors caused by usb3 enclosure.
FYI I also bought a second PI5 with same hardware (usb3 case, nmve base, nmve drive, power adapter) and try to test same procedure of installing fresh and upgrade from ubuntu 23.10 to see if it would be hardware related. I first copied the ZFS with ubuntu23.10 and run it for 2 weeks to see if there were any problems. This was not the case and then I did a test with dist upgrade to 24.04 and clean install ZfsOnRoot with ubuntu 24.04.
Yes, I can read.
USB enclosures tend to be janky as hell and break strangely on slight changes. I have a USB enclosure laying around that works fine in most cases, but if I use a specific SSD and workload, on my Intel system, it produces a flood of IO errors, and on my Pi 4, it (I presume) silently has the same errors but doesn't report them, and then I find out the writes didn't succeed when I get checksum errors.
So that's why I'm suggesting you try to very explicitly test that it happens with the latest release, and with stock ZFS, not whatever Ubuntu shipped.
I'd also test a newer version on Ubuntu 23.10, to see if it's something specific to a ZFS version, or a kernel+ZFS version, or both, or neither.
I agree with that USB3 enclosure are flaky, I am curious if the same is true for NVME base which is connected to pci express port of pi5.
Thanks for the suggestion, I have to research how to create a vanilla ZFS install on ubuntu 24.04 for raspberry pi.
My personal experiences have led me to not use USB enclosures for NVMe except for emergency data recovery, and accept that sometimes they'll just stop working.
YMMV.
That is the reason I am so happy now with the nvme base bottom hat with WD blue sn850 nvme 1TB connected to pci express of pi5 ( configured as v3 for maximum speed ) So no more usb3 drive :)
These are my benchmarks for drive on pi5
Category Test Result
HDParm Disk Read
HDParm Cached Disk Read
DD Disk Write 328 MB/s
FIO 4k random read 131282 IOPS (525128 KB/s)
FIO 4k random write 67814 IOPS (271258 KB/s)
IOZone 4k read 998415 KB/s
IOZone 4k write 212171 KB/s
IOZone 4k random read 700691 KB/s
IOZone 4k random write 179233 KB/s
Score: 73448
using pibenchmarks.com script
sudo curl https://raw.githubusercontent.com/TheRemote/PiBenchmarks/master/Storage.sh | sudo bash
IIRC didn't they say you shouldn't necessarily run the exposed PCIe lanes at PCIe3 because they spit up errors sometimes?
Indeed, officially Raspberry Pi 5 has no official PCIe 3.0 support yet.
I run this with PCIe 3.0 enabled for 2 months now and no errors/hickups/reboots/zfs errors.
This system is on 24/7 and runs mailcow/gitea/harbor/pihole/samba/crowdsec.
For me it is stable as a rock. And the speed increase is quite significant :)
And running zpool scrub
every week as a good citizen.
FYI someone else tested PCIe 2.0 vs 3.0 on the board I am using
Pimoroni NVMe base board, PCIE 2
HDParm Disk Read 416.98 MB/sec
DD Disk Write 261 MB/s
Pimoroni NVMe base board, PCIE 3
HDParm Disk Read 784.10 MB/sec
DD Disk Write 377 MB/s
to build zfs, as root. Make sure the kernel you're running is that one that will boot, i.e. if Ubuntu has given you a new one, reboot to make sure you're running it. (compare "uname -a" with the latest in /boot)
check /usr/src/ for linux-heade*. If there's nothing there, you'll have to install headers. You can google how to do that.
There are some things you need to be able to build a kernel. This only needs to be be done once:
apt install build-essential autoconf automake libtool gawk alien fakeroot dkms libblkid-dev uuid-dev libudev-dev libssl-dev zlib1g-dev libaio-dev libattr1-dev libelf-dev linux-headers-generic python3 python3-dev python3-setuptools python3-cffi libffi-dev python3-packaging git libcurl4-openssl-dev debhelper-compat dh-python po-debconf python3-all-dev python3-sphinx parallel
git clone https://github.com/openzfs/zfs.git cd zfs git checkout zfs-2.2.4 sh autogen.sh ./configure make -s -jN ; N is the number of processors you have, make will use them in parallel cp module/*.ko /lib/modules/VERSION/kernel/zfs/
check /lib/modules/VERSION/kernel/zfs/ to make sure the only files there are zfs.ko and spl.ko. It's possible there are are compressed versions with a different extension. If so, remove them.
reboot.
zfs -V
should show zfs-kmod as 2.2.4.
However in principle your 23.10 and 24.04 have the same version of zfs, 2.2.0. Unless Ubuntu blew it, the change is in kernel version, not ZFS. It's possible that the USB code changed, and that this might affect a marginal USB device.
While no one (including me) quite trusts Ubuntu's ZFS, it's unlikely that the two copies of 2.2.0 differ in a way that would produce the effect you're seeing.
You can also consider loading a mainline kernel. which you can get using Ubuntu's mainline tool. (Google will find it for you). I'd try kernel 6.6, which is the latest LTS. Since grub will by default load he newest kernel, you'll need to pick the right one from the boot menu. To see the boot menu,
in /etc/default/grub, comment out GRUB_TIMEOUT_STYLE=hidden and set GRUB_TIMEOUT to a non-zero value such as 15
then run update-grub
The mainline kernel won't have zfs, so copy module/zfs.ko and spl.ko to /lib/modules/VERSION/kernel and type "depmod" before rebooting. Without some hackery, it's easiest to boot the new kernel without zfs before building ZFS and installing it. I'm not enntirely sure how to build ZFS for a different kernel that what you're running, though it should be possible. If you can build it on a different system of the same architecture running your new kernel, just copy zfs.ko and zpl.ko to the right locations and use "depmod VERSION".
@clhedrick Thank you for the ZFS build how-to, appreciated!
Only difference between ubuntu 23.10 and 24.04 for raspberry pi is that first uses ZFS 2.2.0 and other ZFS 2.2.2. Both ZFS versions having patches from ubuntu team.
Ubuntu 23.10 uses 6.5 Linux kernel and Ubuntu 24.04 uses 6.8 Linux kernel. And I believe that only ZFS 2.2.4 has official support for kernel 6.8, but correct me if I am wrong.
Another option could also be te see if I downgrade kernel to 6.5 for ubuntu 24.04, to see if that makes a difference. Anyhow got some options to test :)
@luckyluc74 might be worth a try to use https://github.com/zabbly/linux and https://github.com/zabbly/zfs - they should be arm64 kernel and afaik the kernel is patched to allow zfs to use neon instructions. Never tested this on a pi through but it's a current mainline kernel with current stable zfs. Maybe it's useful to reproduce any issues.
System information
Describe the problem you're observing
Problem is that when new data is written on the pool and I do a zpool scrub the pool shows data corruption on the pool. This happens within a couple of minutes.
zpool status -v
I can reproduce the problem both on
On ubuntu 23.10 with
kernel 6.5.0-1017-raspi
on raspberry pi5 with ZfsOnRoot with ZFS 2.2.0 (zfs-2.2.0-0ubuntu1~23.10.3
zfs-kmod-2.2.0-0ubuntu1~23.10.2
) everything runs smooth without problems since december 2023. I also tried to do same fresh install on fresh new NVME drive and other pi5 with same results for ubuntu 24.04. This to rule out hardware errors.Describe how to reproduce the problem
You need following hardware and software
Use following guide to install zfs on root on raspberry
I have written following scripts to make installation easier.
Steps 1) boot pi5 with sd card with image
ubuntu-24.04-preinstalled-server-arm64+raspi.img
2) copy scriptsrootzfs_pi_step*.sh
to /root directory on sd card and copy you ssh authorized_keys 3)wget
image ubuntu-24.04-preinstalled-server-arm64+raspi.img.xz and make sure it is unpacked withxz
(apt install -y xz-utils
andunxz <imagename.xz>
) , installsfdisk
(apt install -y fdisk
), installzfs-utils
(apt install -y zfsutils-linux
)4) connect usb3 NVME drive and make sure it is empty ( example:
wipefs -a /dev/sda
, check for drive withlsblk
) 5) as root run first scriptrootzfs_pi_step1.sh
this will install ZfsOnRoot on usb3 drive 6) shut down pi5 and remove sd card. Now boot from usb3 drive 7) login as root and run scriptrootzfs_pi_step2_3_and_4.sh
and reboot after script is finished 8) login as root and run scriptrootzfs_pi_step5_firstboot
and reboot after script is finished 9) login as root and run scriptrootzfs_pi_step6_fullsoftware.sh
and reboot after script is finishedNow you should have working ZfsOnRoot with Ubuntu 24.04. Start installing stuff, downloading data to generate disk activity. I installed k3s and installed helm chart and antoher test I did was installing docker and running different docker compose files ( mailcow, harbor, pihole, gitea and more) Then do
zpool scrub rpool
and it will shows data corruption on poolrootzfs_pi*.sh scripts rootzfs_pi_step scripts.zip
Include any warning/errors/backtraces from the system logs
dmesg.txt syslog.txt zfs_dbgmsg.txt zfs_list.txt zpool_history.txt