Open pronoiac opened 2 months ago
I vaguely recall an issue with CIFS in mainline not so long back - a fix had been backported in mainline erroneously.
We do build more recent kernels than are packaged into apt. On Raspberry Pi OS you can use sudo rpi-update
to get the latest build of the current LTS branch (6.6 at the moment), or use eg sudo rpi-update rpi-6.10.y
to grab the 6.10 kernel.
It would be useful if you could tell us if the issue is still present in the latest 6.6 branch, and on 6.10.
Please be aware that there is a low-but-non-zero risk of regressions in taking these builds, so please test on a non-critical system, or at least backup first. Having a backup copy of the /boot/firmware/kernel*.img files to restore is generally sufficient, as rpi-update does not delete the old modules.
NB These CI builds are only available for 90 days after the last update on that branch, so generally it's only the LTS branch (6.6), the latest released branch (6.10), and the prepatch branch (6.11) that will be available.
From rpi-update:
Interesting that it appears to be something that was broken by 6.6 and now fixed, but not backported.
If you're happy rebuilding the kernel, identifying whether the rpi-6.7.y, rpi-6.8.y, and rpi-6.9.y branches are good or not would be very useful. Unfortunately the CI build artifacts are likely to have expired for those branches, so it needs to be manual builds.
Sorry to ask you to do the investigative work, but you have a system setup that you can get to fail.
I've forced rebuilds of rpi-6.7.y, rpi-6.8.y and rpi-6.9.y. Wait about 45 minutes then try sudo rpi-update rpi-6.7.y
etc.
(You can see the in-progress builds here: https://github.com/raspberrypi/linux/actions?query=is%3Ain_progress)
They should be ready now.
My Internet connection's misbehaving today, but I will investigate when I can.
Possibly of note: the issue might go as far back as v6.3. Those builds are very helpful; building on my Pi takes about two hours.
I re-ran 6.8.12 - after the new eeprom - and it didn't work.
I've been looking for the fix for 6.10; I'm bisecting into its rc1.
Reading the rpi-update page (edit: new repo), it looks like it can pull in bleeding edge firmware, with risk of regressions. I intended to use it to pull in kernel 6.6.50, but then checking some kernels I'd built, I'm seeing breakage where it worked before.
Any suggestions?
Reading the rpi-update page
Check the first line of the readme.
I intended to use it to pull in kernel 6.6.50, but then checking some kernels I'd built, I'm seeing breakage where it worked before. Any suggestions?
Not based on what you've posted. If you post exactly what you did, and exactly what the breakage was it's possible there will be suggestions.
I updated the link, in case you were thinking, that's the deprecated rpi-update repo.
What I did:
sudo rpi-update rpi-6.6.y
Vaguely, some options I see:
I'm still not following which cases are which in "I'm seeing breakage where it worked before."
Is the breakage here the "Silent corruption writing files to network share over cifs" or something else? Are you saying rpi-update kernel behaves the same or differently to your self built one?
The network share breakage manifests as lzop failing to decompress, and that works, or doesn't, depending on the Linux kernel version. I've attempted bisection of the Linux kernel. rpi-update appears to change something in addition to the Linux kernel version, so that a kernel I'd tested, will stop working.
rpi-update may update bootloader and/or firmware (start.elf). There are options to disable that.
so that a kernel I'd tested, will stop working.
Stop working in what way? Try to be less vague.
Stop working in what way? Try to be less vague.
I'll re-run the compression & decompression, and while they worked before, the decompression fails, as the file was corrupted.
What you are describing sounds a lot like a random/timing-related issue, which would make testing challenging.
Describe the bug
wrong location?
First off, I got this repo from the package description for the installed kernel. Apologies if I'm not in the right spot.
Short version
I was benchmarking some compressors on Debian on a Raspberry Pi, piping to and from a network share on a NAS, and found that some consistently had issues writing to my NAS. Specifically: lzop, pigz (parallel gzip), and pbzip2 (parallel bzip2). This seems dependent on kernel version: Debian 11, bullseye, kernel 6.1.21, was ok. Debian 12, bookworm, kernel versions 6.6.20 and 6.6.31, were impacted.
Compiling and running a mainline kernel 6.1.21 on bookworm avoided the issue. I don’t think Debian patches are at fault.
There's over a year between those kernel releases. Bisecting won’t be quick, but it is doable.
Steps to reproduce the behaviour
It looks like this, on a mounted network share:
Device (s)
Raspberry Pi 4 Mod. B
System
OS & version:
Firmware version:
(That device file didn't help)
Kernel version:
Logs
No response
Additional context
More details
The Pi and NAS are directly connected by Gigabit Ethernet. Both sides are using self-assigned IP addresses. The files in question are file systems, about 270 gig. Compression seems to work, without complaint; decompression crashes the process, usually within the first gig of the compressed file. It looks like the compressed files are corrupt. Trying decompression during compression gets further along than it does after compression finishes; this might point toward something with writes and caches. This is a Raspberry Pi 4, with 4 GiB RAM.
Wrong location, more details
I reported the issue to Debian, which they closed:
My impression:
Giving a heads up to the most likely impacted people makes sense -