Open multi opened 5 months ago
@multi thanks for reporting this, greatly appreciated!
To confirm, with zfs_vdev_disk_classic=1
(or on 2.2.x) it works the way you'd expect?
Tell me more about "doesn't go to sleep". How do you normally tell whether the disk is asleep or not (actual command)? Is there some program or job that runs to spin down disks, or do they go to sleep when they're idle?
Do you have any metrics showing IO to the disks during those periods? iostat -yxd 1
and zpool iostat -vl 1
are the kind of thing I'd like to see.
My guess all together is that there's something holding the drive open and/or actively issuing or waiting for IO in those periods such that the disk doesn't sleep, but I don't have much mental model for what my cause that, so if I can understand what you've got happening and maybe reproduce it, I can dig deeper.
Thanks!
@robn I thank you!
To confirm, with zfs_vdev_disk_classic=1 (or on 2.2.x) it works the way you'd expect?
No, none of the combos kernel/zfs/zfs_vdev_disk_classic works as "expected" at the moment. Before yesterday, I was running 6.7.9-hardened + 8f2f6cd and nothing in the kernel args for zfs_vdev_disk_classic
- the disks power mode was behaving as it should be normally.
Tell me more about "doesn't go to sleep". How do you normally tell whether the disk is asleep or not (actual command)?
smartctl --info --health --attributes --tolerance=verypermissive -n standby --format=brief /dev/sda
It started to show Device is in ACTIVE or IDLE mode
for two of the disks, instead of Device is in IDLE_A mode
(for eg.)
Also, hddtemp /dev/sd{e..f}
started to show the temperature, instead of drive is sleeping
Do you have any metrics showing IO to the disks during those periods? iostat -yxd 1 and zpool iostat -vl 1 are the kind of thing I'd like to see.
That's from telegraf/diskio (sde + sdf - are the two disks that are active, sd{a..d} - are fine)
Tha's from telegraf/zpool_influxdb
I've tried running zpool iostat -vl 1
it shows some values only on the first loop, then everything is zeros.
iostat -yxd 1
also shows zeros
sdf
just fall asleep ... Will keep you informed what's going on with the last remaining active one - sde
.
That looks like scrub traffic. Given its start-of-month, could it just be a monthly scrub task? With crashes, maybe it restarted or got delayed a few times?
I've started a scrub manually yesterday (because of https://github.com/openzfs/zfs/issues/16050)
zpool status -v
shows scan: scrub repaired 0B in 07:29:15 with 0 errors on Tue Apr 2 14:19:32 2024
- if there was a scrub running at the moment it will be another message, right?
Given its start-of-month, could it just be a monthly scrub task?
No, it's not a scheduled one
With crashes, maybe it restarted or got delayed a few times?
For sure, it restarted few times yesterday
And sde
fall asleep...
I'll boot again kernel 6.7.11-hardened and zfs master + the patch from https://github.com/openzfs/zfs/commit/1c22ed4549e6dd9e8251420ed495a6f1979884ea to see, if that's going to change something in the disks power mode behaviour
Rebooted, all disks are sleeping now (as it should be). So, maybe it's not related to the changes here https://github.com/openzfs/zfs/compare/8f2f6cd...39be46f
Not sure, if it was a delayed scrub (but, no signs from it in zpool status -v
).
Feel free to close that issue. If you don't have any other ideas/questions. I'll reopen it on odd behaviour again :)
System information
Describe the problem you're observing
That's a follow up from https://github.com/openzfs/zfs/issues/16050#issuecomment-2034156879
After updating the kernel to 6.7.11-hardened and zfs to 39be46f I've got a kernel null pointer deref.
I've issued a scrub command and after that (plus few/reboots and kernel/zfs boot combos) - two (mostly, sometimes all) of the disks never goes to sleep.
Last 2 days
Last 7 days
Tried 6.7.9-hardened + 8f2f6cd + zfs_vdev_disk_classic=0 - just to confirm, if all disks goes to sleep as it should be (that was working before yesterday) - and no. The same two disks (from a raidz2 of 6) never goes to sleep :/