openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.32k stars 1.72k forks source link

system freezes when zfs is waiting for disks to spin up #3785

Closed 0xFelix closed 8 years ago

0xFelix commented 8 years ago

I observed the following, not quite 100% sure if the problem is the spindown:

My system freezes when zfs waits for disks to spin up after updating to 0.65. I have a RAIDZ2 with 6 disks, that go to sleep after 1 hour of idling. With 0.64 everything worked kind of fine, waiting for the data of the pool to become ready after the disks spun up already took quite an amount of time, but it worked. Now my system freezes when zfs waits for disks to spin up and the only thing I can do is hard reset the system. Preventing the disks from going to sleep currently works as a workaround.

Does ZFS not support the spindown of disks?!

System Info:

E3-1225 v3 Processor 16GB of ECC RAM M1015 P20 IT 6x 3TB drives

ZFS version: 0.65 OS: Ubuntu 14.04.3 LTS (Kernel 3.19)

gbooker commented 8 years ago

I'm experiencing the exact same thing, though in my case it is a 2 disk mirror on USB disks. Since upgrading to ZFS 0.6.5, I've had 7 hard-lockups in 36 hours. In my case, the hdparm command to stop the drive sleep seems to not work. Each time the machine is locked up, one drive in the mirror seems to have spun up, but not the second. It's as if it triggered the spin up on one drive, then proceeded to lock CPU cores in a progressive fashion until the machine is unresponsive. Disconnecting the USB doesn't help. This doesn't bode well for ZoL surviving hardware failure. If a drive becomes unresponsive, will it choose to lock the computer rather than degrading the pool?

Same OS, ZFS version, and kernel, Core i5-2300, 12G of RAM. Using same HBA on internal pool, though that pool does not seem to be the problem.

gbooker commented 8 years ago

Some more info in the hopes it is useful: Zpool import/export work even if the drives are not spinning. Exporting the pool and leaving it that way ends the lockups. I think it is quite definitive the issue is with an imported pool where the drives can spin down.

This is a problem with USB disks as they often cannot be stopped from spinning down. I know the argument against using ZFS on USB disks, but I also know people who use ZFS on USB disks over other FSs because it interoperates well between linux and mac and supports today's disk sizes and files over 4G.

kernelOfTruth commented 8 years ago

@gbooker not sure if it's 100% related to @0xFelix problem since his disks appear to be all internal (?)

I've also encountered issues with spinning up/down of disks and USB and ran into problems with external disks that I haven't ran with internal ones (SATA), e.g. the XHCI driver having bugs, leading to hardlocks of the system, or resetting the driver & link - and assigning a different drive letter to the drive (there's additional complexity when using cryptsetup, lvm, etc.); the external harddrive enclosure (or firmware of the HDDs) having a built-in timeout which sends the harddrives to standby ...

last time I read something about timeouts of (broken) disks, there was some work done by @ryao

it would be interesting to know if there's something that could be improved specifically on Linux to prevent lockups, stalls, etc. of the rest of the system during these kind of situations in conjunction with ZFS usage

kernelOfTruth commented 8 years ago

I just also encountered an issue with a spun down external USB enclosure:

cannot receive: specified fs (HGST5K4000/bak_ext/) does not exist

this was shown after it took really long for the harddrive to spin up, then it ended up to fail anyway ...

luckily the system didn't lock up - I however agree that there should be a timeout setting to decide how long the drive can take to spin up and respond

@gbooker , @0xFelix

did any noteworthy change besides the upgrade from ZOL 0.6.4 to 0.6.5 ?

the /sys/module/zfs/parameters/zfs_deadman_synctime_ms

setting comes to mind, however I don't know if that setting also has the effect to lead the system to "panic" on ZFSonLinux

/*
 * Expiration time in milliseconds. This value has two meanings. First it is
 * used to determine when the spa_deadman() logic should fire. By default the
 * spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
 * Secondly, the value determines if an I/O is considered "hung". Any I/O that
 * has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
 * in a system panic.
 */

https://github.com/zfsonlinux/zfs/blob/6cde64351e236712a17d41c1578d5843a0f006e4/module/zfs/spa_misc.c

Expiration time in milliseconds. This value has two meanings. First it is
used to determine when the spa_deadman() logic should fire. By default the
spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
Secondly, the value determines if an I/O is considered "hung". Any I/O that
has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
in a zevent being logged.
.sp
Default value: \fB1,000,000\fR.

https://github.com/zfsonlinux/zfs/blob/9965059ab9991a5fc7df9a489021e73880b3bcc0/man/man5/zfs-module-parameters.5

it could be worth raising the value of that setting, however according to the manual it only would log an zevent (on ZOL only ?)

referencing https://github.com/zfsonlinux/zfs/issues/471 Reduce timeout on disks #471

and some more:

https://www.illumos.org/issues/1553 ZFS should not trust the layers underneath regarding drive timeouts/failure

http://serverfault.com/questions/682061/very-irregular-disk-write-performance-and-repeated-sata-timeouts Very irregular disk write performance and repeated SATA timeouts [closed]

http://www.spinics.net/lists/linux-ide/msg50979.html [PATCH] libata: increase the timeout when setting transfer mode

The question is what could have caused this change from 0.6.4 to 0.6.5

0xFelix commented 8 years ago

Hi,

yes my disks are all internal SATA3.

Besides upgrading ZFS I did not change anything.

I will try changing that value later.

mountassir commented 8 years ago

Same here, I have been using a script (hdparn -Y ...) to spin down the drives when idle for a couple of years now and it has been working as expected. If the pool is not accessed for a while all the drives spin down and if I access the pool I could hear the drives spinning up one after the other and within few seconds the pool is live and accessible.

Few days ago I was prompt to upgrade to the latest ZFS build, the drives still spin down correctly after the update by when I try to access the pool I see CPU usage go 100% in htop and the system then freezes after a couple of seconds. The only thing left to do after is a hard reset.

Drives not in the ZFS pool still spin down/up without any issues, so I am guessing this is specific to ZFS.

OS: Ubuntu 14.04.3 server CPU: AMD FX 6100 RAM: 16GB ECC RAM Controller: LSI 9211-8i IT Pool: 5 x 2TB in raidz1 Backup: 1 X 4TB ext4

gbooker commented 8 years ago

I should also add that in my configuration nothing changed between the instances where it ran fine and when it was locking up continually except for the ZFS version. The kernel version didn't even change and the USB mirror pool was present for about a year prior without incident.

@kernelOfTruth I'm not sure if those values would affect anything. If I'm reading it correctly, it sets a timeout of 1000 seconds (nearly 17 minutes) before it gives up on the drive and panics. As I said, when I had my lockups, one of the USB disks had spun up and it was in a few seconds, not on the order of minutes. Also I was present for two of these events and noticed CPU cores hitting 100% utilization within seconds of access to the pool and successive cores also becoming unavailable with the entire system locked within 30 or so seconds from the initial point of access. I suspect other processes are hitting a mutex which is locked and never released resulting in a pause that never resumes.

I've not dug into the code enough to speak to what changed between 0.6.4 and 0.6.5 but everything I've seen points to that as the culprit. It could also be in SPL instead of ZFS.

behlendorf commented 8 years ago

did any noteworthy change besides the upgrade from ZOL 0.6.4 to 0.6.5 ?

This issue surprises me because nothing noteworthy changed between 0.6.4 and 0.6.5 in this regard. ZFS has never done anything special to either explicitly spin-up or spin-down the drives. Spinning down is left to the standard Linux utilities, and the drives should automatically spin-up when ZFS issues an I/O to them which they need to service.

Does anyone have any additional debugging they can provide? A back trace from the console perhaps?

0xFelix commented 8 years ago

I would gladly help you, but could you tell me how to make such a backtrace?

behlendorf commented 8 years ago

@kernelOfTruth thanks we may have what we need in #3817.

@0xFelix the stacks in #3817 suggest that this might be caused by getting one of the IO threads wedged waiting for an I/O that will now never complete. This might lead to a more severe issue than in the past because we're more aggressive about managing the number of running threads. If you're able to reproduce this could you try setting the module option spl_taskq_thread_dynamic=0 at boot time and seeing if this resolves the issue.

behlendorf commented 8 years ago

Resolved by 5592404 which will be cherry-picked in to 0.6.5.2 release.

0xFelix commented 8 years ago

@behlendorf The system freezes are gone after updating to 0.6.5.2 but I'm getting these now, I'm sure the disks are 100% OK!? These errors result in CKSUM errors when scrubbing the pool.

[59526.359997] sd 0:0:1:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59526.360003] sd 0:0:1:0: [sdc] CDB:
[59526.360006] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00
[59526.360022] blk_update_request: I/O error, dev sdc, sector 824769880
[59544.111090] sd 0:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59544.111097] sd 0:0:0:0: [sdb] CDB:
[59544.111100] Read(16): 88 00 00 00 00 00 31 28 fd 50 00 00 00 08 00 00
[59544.111115] blk_update_request: I/O error, dev sdb, sector 824769872
[59544.114465] sd 0:0:4:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59544.114468] sd 0:0:4:0: [sdf] CDB:
[59544.114469] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00
[59544.114483] blk_update_request: I/O error, dev sdf, sector 824769880
[59552.117436] sd 0:0:3:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59552.117443] sd 0:0:3:0: [sde] CDB:
[59552.117446] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00
[59552.117462] blk_update_request: I/O error, dev sde, sector 824769968
[59572.951158] sd 0:0:2:0: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59572.951167] sd 0:0:2:0: [sdd] CDB:
[59572.951170] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00
[59572.951192] blk_update_request: I/O error, dev sdd, sector 824769968
[59572.955679] sd 0:0:5:0: [sdg] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59572.955695] sd 0:0:5:0: [sdg] CDB:
[59572.955701] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00
[59572.955720] blk_update_request: I/O error, dev sdg, sector 824769968
[70357.782677] sd 0:0:4:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[70357.782686] sd 0:0:4:0: [sdf] CDB:
[70357.782690] Read(16): 88 00 00 00 00 00 85 c1 c9 08 00 00 00 08 00 00
[70357.782712] blk_update_request: I/O error, dev sdf, sector 2244069640
[70368.087947] sd 0:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[70368.087953] sd 0:0:0:0: [sdb] CDB:
[70368.087955] Read(16): 88 00 00 00 00 00 85 c1 c9 00 00 00 00 08 00 00
[70368.087969] blk_update_request: I/O error, dev sdb, sector 2244069632
kernelOfTruth commented 8 years ago

@0xFelix are you sure ?

hdparm --read-sector 824769880 --yes-i-know-what-i-am-doing /dev/foo

just do a few to be really sure

to be honest: that looks suspiciously like an error of libata driver, controller, cable, etc and rather not failing drives

if yes, it's really bad luck (but I doubt it)

fingers crossed :wink:

0xFelix commented 8 years ago

@kernelOfTruth I guess it is unlikely that all 6 disks died at the same time?! The errors come from sd[b-g] ... all disks that are in this pool. Never had any problems before 0.6.5.

0xFelix commented 8 years ago

@kernelOfTruth Tried hdparm --read-sector 824769880 --yes-i-know-what-i-am-doing /dev/sdc more than 10 times, always succeeds.

kernelOfTruth commented 8 years ago

@0xFelix yeah, that's what I meant, it's highly unlikely

just found https://github.com/zfsonlinux/zfs/issues/3212 again

where "block: remove artifical max_hw_sectors cap" was mentioned (3.19+)

ensure that you upgrade to at least 3.19.8 which includes https://lkml.org/lkml/2015/8/27/712 , http://www.gossamer-threads.com/lists/linux/kernel/2219390?page=last

(that should be covered by recent Ubuntu system updates ?)

This reverts commit 34b48db66e08, which caused significant iozone performance regressions and uncovered a silent data corruption bug in at least one disk.

For SAN storage, we've seen initial write and re-write performance drop 25-50% across all I/O sizes. On locally attached storage, we've seen regressions of 40% for all I/O types, but only for I/O sizes larger than 1MB.

kernelOfTruth commented 8 years ago

need more info on the disks

"blk_update_request i/o error" 3.19

search terms suggest it could be related to NCQ timeout (seagate firmware bug), USB3.0, and other factors

USB driver doesn't apply since yours are SATA-connected ...

0xFelix commented 8 years ago

@kernelOfTruth I'm on 3.19.0-30-generic, not so sure if that patch is included...

0xFelix commented 8 years ago

@kernelOfTruth Seagate NCQ timeout bug sounds plausible... 5 of these disks are ST3000DM001, but disk /dev/sdc is a HGST 3TB NAS drive...

kernelOfTruth commented 8 years ago

@0xFelix alright, quick "fix" would then be to boot the kernel via

libata.force=noncq

that should disable NCQ like e.g. so:

ata2.00: FORCE: horkage modified (noncq) ata2.00: 5860533168 sectors, multi 16: LBA48 NCQ (not used)

Have backups ready, run a few S.M.A.R.T. tests (short, conveyance [if applicable], long, offline)

your output mentions sdb, sdc, sdd, sde, sdf, sdg

look for further indications of error messages in dmesg output (also during boot)

that smells really fishy

https://forums.gentoo.org/viewtopic-t-969756.html?sid=92d287eb3cdf6d9ddb248fe941a7d11b , http://unix.stackexchange.com/questions/99553/does-a-bad-sector-indicate-a-failing-disk

consider further drive firmware issues with e.g. ALPM (some drives have issues with lower power states - so setting to max_performance should be your best bet if not already set),

the cables are fine ?

PSU ? power connection ?

well, this topic somewhat appears to go beyond this issue entry but it would be still good to know if there was some underlying problem with 3.19+ kernels and newer ZoL

0xFelix commented 8 years ago

@kernelOfTruth did that work? Not sure if the disks on my LSI2008 card got ncq disabled too...

[    1.961927] ata4.00: FORCE: horkage modified (noncq)
[    1.966062] ata4.00: 246162672 sectors, multi 1: LBA48 NCQ (not used)
[    2.559586] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[   10.154670] scsi 0:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[   10.406780] scsi 0:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[   10.654806] scsi 0:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[   10.904939] scsi 0:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[   11.156407] scsi 0:0:4:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[   11.407687] scsi 0:0:5:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[   11.657657] scsi 0:0:6:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
kernelOfTruth commented 8 years ago

@0xFelix it doesn't look like it's disabled, but I'm not familiar with that controller

since 2011 that capability should be existent to disable NCQ (http://markmail.org/message/b5gp7jaon47zbrsq)

hm, so it could also be an issue with NCQ (the drives ?) and /or the firmware of that controller

FORCE: horkage modified (noncq) ata4.00: 246162672 sectors, multi 1: LBA48 NCQ (not used)

indicates that it was applied to at least one drive - can't see which one that is due to the previous output being missing from your paste

0xFelix commented 8 years ago

@kernelOfTruth libata.force=noncq did only work for the SSD on the onboard SATA. mpt2sas does not seem to have an option for disabling ncq?! I could not find one.

kernelOfTruth commented 8 years ago

Added enable/disable SATA NCQ operations to SAS IO Unit Control Request.

first hit searching for SAS IO Unit Control led to

http://hwraid.le-vert.net/wiki/LSIFusionMPT

the tools are lsiutil and mpt-status

https://bugs.launchpad.net/ubuntu/+source/ecs/+bug/599830 https://forum.manjaro.org/index.php?topic=5575.0

suggest updating the firmware

disabling for drives: http://fibrevillage.com/storage/170-linux-sofrware-array-performance-tuning http://blog.disksurvey.org/blog/2013/10/28/ncq-disabled/ (does that apply here ?)

0xFelix commented 8 years ago

Seems like those utils are for the older mptsas driver, not mpt2sas.

I will try to update the firmware in the next days...

kernelOfTruth commented 8 years ago

@0xFelix have backups ready, just in case ...

kernelOfTruth commented 8 years ago

ensure that you upgrade to at least 3.19.8 which includes https://lkml.org/lkml/2015/8/27/712 , http://www.gossamer-threads.com/lists/linux/kernel/2219390?page=last

that seems to be included by Linux 3.19.8-ckt7 according to http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y which is the latest kernel (september 25th, it depends whether the packages are already available)

can't easily see whether Linux 3.19.8-ckt7 equals 3.19.0-30-generic

that's one nontransparent naming scheme http://www.ubuntuupdates.org/package/canonical_kernel_team/vivid/main/base/linux

does one need to enable extended stable kernel ppas ?

Ubuntu /sigh :disappointed:

0xFelix commented 8 years ago

@kernelOfTruth I examined the situation a bit further.

The errors occured again at exactly 3am, when a periodic task called zpool status. The disks were asleep when the errors occured and all disks regardless whether HGST or Seagate produced the errors.

I don't think 3.19.8-ckt7 equals 3.19.0-30-generic.

Well... maybe I have to file another Ubuntu bug.

kernelOfTruth commented 8 years ago

@0xFelix so it's either a Ubuntu-specific thing (?), a Linux kernel harddrive timeout issue or that ZFS doesn't get notified about the drives not being ready (there appears to be an issue - but haven't seen exactly what or why it would (not) trigger the notification in the first place (?))

can you see what timeout is set on the disks and potentially raise it ?

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/task_controlling-scsi-command-timer-onlining-devices.html http://unix.stackexchange.com/questions/174400/can-the-hard-drive-timeout-be-disabled-in-linux-attempting-task-abort http://www.cyberciti.biz/faq/debian-ubuntu-linux-list-scsi-devices-hosts-attributes-lsscsi-command/

are those disks in standby or sleeping ?

according to hdparm the latter is deeper and might cause the delay to be ever bigger:

 -y     Force  an  IDE drive to immediately enter the low power consump‐
          tion standby mode, usually causing it to spin down.  The current
          power mode status can be checked using the -C option.

  -Y     Force  an  IDE  drive to immediately enter the lowest power con‐
          sumption sleep mode, causing it to shut down completely.  A hard
          or soft reset is required before the drive can be accessed again
          (the Linux IDE driver will automatically handle issuing a  reset
          if/when  needed).   The current power mode status can be checked
          using the -C option.
0xFelix commented 8 years ago

@kernelOfTruth Tried to raise the timeout... the errors still occur.

The state the drives are in is equal to hdparm -y.

I guess it is a kernel bug then?

Also:

According to cat /proc/version_signature my kernel version equals to 3.19.8-ckt6.

I file a bug report for ubuntu now.

kernelOfTruth commented 8 years ago

@0xFelix yes, might be

I hope the devs or people on launchpad can offer some insight into this issue

behlendorf commented 8 years ago

@0xFelix the issue your seeing definitely looks like a problem occurring below ZFS in the stack. Those read I/O errors are coming from SCSI driver. ZFS issued a read due to the zpool status command being run but the lower layers couldn't complete the I/O for some reason. ZFS just treats it like any other I/O from the device. Sorry, I couldn't be of more help.

0xFelix commented 8 years ago

@behlendorf Seems like SCSI yeah, but don't worry, thank you for your help and thank you for the great zfsonlinux project! :-)

I filed a bug in launchpad, let's see what the ubuntu devs have to say. ;-)

0xFelix commented 8 years ago

@kernelOfTruth @behlendorf I've rebuilt my server with Debian 8 in the hope things would get better... but.. exactly the same bug on Debian as well. I again begin to think it has something to do with ZFS, executing hdparm -y /dev/sd[a-f] and then for example directly executing hdparm -C /dev/sda works without hanging the system. When putting a disk to sleep, then writing something to the pool and the issuing hdparm -C /dev/sda the system begins to produce the errors again. Is ZFS blocking somehow when the disks are sleeping and another process wants to access the disks?!

kernelOfTruth commented 8 years ago

@0xFelix that doesn't look like an issue related to ZFS is hdparm -C /dev/sda already causes errors to show up

e.g.

I put the disks to sleep by

hdparm -Y /dev/sdc

then checking the state, even after hours later:

hdparm -C /dev/sde
/dev/sde:
 drive state is:  standby

Any progress on the bug you filed at launchpad ?

0xFelix commented 8 years ago

@kernelOfTruth

Testing like you did by only using hdparm does not produce the error.

Putting the disk to sleep, the writing something to the pool and then checking a disks status does produce the error.

The confirmed the error but no further action followed until now.

0xFelix commented 8 years ago

@kernelOfTruth I guess I found my bug...

http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg14937.html

Apart from not outputting "Device not ready" this seems to be exactly my bug. So it is no ZFS problem...

The id for the commit including the fix is d3ed1731ba6bca32ac0ac96377cae8fd735e16e6, it should have been included since mainline kernel 3.4.11. Let's see if Debian and Ubuntu both included this fix in their kernel sources...

kernelOfTruth commented 8 years ago

@0xFelix that commit ID seems to be dead or meanwhile invalid:

The new one is:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e [SCSI] Fix 'Device not ready' issue on mpt2sas

https://www.redhat.com/archives/dm-devel/2014-September/msg00175.html Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices

https://lkml.org/lkml/2014/11/16/122 [GIT PULL] SCSI fixes for 3.18-rc4

I believe I have observed similar behavior with an external USB 3.0 HDD enclosure which I no longer use ...

kobuki commented 8 years ago

I'm experiencing similar behavior. I've recently upgraded my ZFS VM from Deabian 7.x to 8.3 along with the ZFS stack (0.6.4-1.2-1-wheezy -> 0.6.5.2-2-wheezy). I'm having my disks sleep after 30 minutes of inactivity (hdparm -y 242). When they're woken up by activity on the pool, I see the following errors in dmesg for all 6 disks of the raidz2 pool:

[ 5442.727017] blk_update_request: I/O error, dev sdd, sector 1275684536
[ 5442.727626] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 5442.727638] sd 0:0:4:0: [sde] CDB: Read(10) 28 00 4c 09 66 b8 00 00 08 00

But, when I simply do the fiollowing with the sleeping disks:

dd if=/dev/sdX of=/dev/zero bs=1M

then there are NO errors appearing at all. It appears that ZFS does something differently than simply trying to read blocks from the disk. These errors never appeared with the old system, with kernel 3.2 and ZFS 0.6.4. I'm using an LSI2008 HBA redirected to a KVM VM using MMIO. Apart form these errors appearing in the logs on wakeup, I experience no odd behavior on my pool. The I/O errors produced by the lower layers do not seem to affect the pool I/O error counters, even though ZED reports them in the syslog and in email.

I'm no expert at all, but one might think that this might be the solution to the problem, as linked by @kernelOfTruth previously. I'll try to patch my kernel later to see what happens (using 4.3.0-0.bpo.1-amd64 now).

0xFelix commented 7 years ago

@kobuki Did you find a solution to the problem? Did the mentioned patch work?

kobuki commented 7 years ago

Sorry, I didn't try it. But I did a few driver upgrades since then and I haven't seen the errors in syslog recently. I will check again. But I never had any problems aside these worrysome log entries.

0xFelix commented 7 years ago

What kernel and driver are you running currently? Still Debian 8?

hoppel118 commented 7 years ago

Hey guys,

the same for me, have a look at these issues:

https://github.com/zfsonlinux/zfs/issues/3785 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1504909?comments=all

Here I reported some things about my situation and my environment:

https://github.com/zfsonlinux/zfs/issues/4638

Greetings Hoppel118

thomasvoigt commented 7 years ago

Hi there!

I can replicate this behavior on Linux-4.8.11 (vanilla) + LSI SAS2008 (mpt3sas) + dmraid + xfs. Maybe it has nothing to do with zfs but with mpt3sas?

HTH and best regards, Thomas

night199uk commented 7 years ago

+1 - kernel 4.8.0-54 (Ubuntu) + LSI SAS2008 (mpt3sas) + zfs on Seagate ST8000DM002 [ 5199.987612] sd 1:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 5199.987622] sd 1:0:0:0: [sda] tag#0 CDB: Read(16) 88 00 00 00 00 02 44 62 30 58 00 00 00 08 00 00 [ 5199.987627] blk_update_request: I/O error, dev sda, sector 9737220184

satmandu commented 6 years ago

I was having a string of these read errors leading to checksum failures on my Ubuntu 18.04 system with either a 4.15.x or 4.16.x kernel and zfs 0.7.8 and drives on a LSI SAS2008 controller, and at least with the 4.15 kernels, this modification to /etc/default/grub seems to have fixed it for me.

GRUB_CMDLINE_LINUX_DEFAULT="mpt3sas.msix_disable=1"

(Remember to run update-grub and reboot after this change.)

I think the problem is related to the system allowing drives to go to sleep on this controller, and then not spinning them back up fast enough such that zfs complains of failed reads.

Well, I think with this change the drives aren't going to sleep any more, for what that's worth. (And no I'm not sure why the drives were going to sleep at all.)

d-helios commented 5 years ago

Hi. I've tried to set mpt3sas.msix_disable=1 and it not helped. my configuration is:

2 x lsi sas 9300-8e (SAS3008)
4 x supermicro SC216BE2C-R741JBOD
disks: HUC101818CS4204, PX05SMB040

also tried to disable suspend by using sdparm, but no success.

sdparm -6  --set ICT=0 $i; sdparm -6 --set IDLE=0 $i; sdparm -6 --set STANDBY=0 $i

OS: Ubuntu 18.04 ZFS: 0.7.5-1ubuntu15 MPT3SAS: 17.100.00.00