openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

ZFS io error when disks are in idle/standby/spindown mode #4713

Closed johnkeates closed 5 years ago

johnkeates commented 8 years ago

Whenever one or more disks in one of my pools is sleeping because it was idle ZFS (via ZED) spams me with IO errors (via email, because that's how I set it up).

It's always this kind of error with only the vpath, vguid and eid changing:

ZFS has detected an io error:

  eid: 15
class: io
 host: clava
 time: 2016-05-29 23:21:18+0200
vtype: disk
vpath: /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5736249-part1
vguid: 0x0094F35F53B1888B
cksum: 0
 read: 0
write: 0
 pool: green pool

dmesg shows:

[ 3647.748383] sd 1:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748386] sd 1:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 64 9d ac c8 00 00 08 00
[ 3647.748388] blk_update_request: I/O error, dev sdd, sector 1688054984
[ 3647.748401] sd 1:0:1:0: [sde] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748402] sd 1:0:1:0: [sde] tag#1 CDB: Read(10) 28 00 b4 26 aa 70 00 00 08 00
[ 3647.748403] blk_update_request: I/O error, dev sde, sector 3022432880
[ 3647.748408] sd 1:0:3:0: [sdg] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748409] sd 1:0:3:0: [sdg] tag#2 CDB: Read(10) 28 00 b4 26 ca 78 00 00 08 00
[ 3647.748410] blk_update_request: I/O error, dev sdg, sector 3022441080
[ 3655.074695] sd 1:0:2:0: [sdf] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3655.074699] sd 1:0:2:0: [sdf] tag#8 CDB: Read(10) 28 00 64 9d b8 c0 00 00 08 00
[ 3655.074700] blk_update_request: I/O error, dev sdf, sector 1688058048
[ 3655.074712] sd 1:0:2:0: [sdf] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3655.074713] sd 1:0:2:0: [sdf] tag#10 Sense Key : Not Ready [current] 
[ 3655.074715] sd 1:0:2:0: [sdf] tag#10 Add. Sense: Logical unit not ready, initializing command required
[ 3655.074716] sd 1:0:2:0: [sdf] tag#10 CDB: Read(10) 28 00 64 9d 80 e8 00 00 08 00
[ 3655.074717] blk_update_request: I/O error, dev sdf, sector 1688043752
[ 3655.074721] sd 1:0:2:0: [sdf] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3655.074722] sd 1:0:2:0: [sdf] tag#13 Sense Key : Not Ready [current] 
[ 3655.074723] sd 1:0:2:0: [sdf] tag#13 Add. Sense: Logical unit not ready, initializing command required
[ 3655.074724] sd 1:0:2:0: [sdf] tag#13 CDB: Read(10) 28 00 64 9d 90 60 00 00 08 00
[ 3655.074725] blk_update_request: I/O error, dev sdf, sector 1688047712

Scrubbing gives no 'fixes', as there is no data corruption or pools that get unhappy, just a few errors. As far as I can see, either ZFS isn't waiting long enough for the disks to spin up (they actually spin up on access), or tries some command before checking the disk is ready for it.

The pool status:

john@clava:~$ sudo zpool status green pool
  pool: green pool
 state: ONLINE
  scan: scrub repaired 0 in 19h28m with 0 errors on Sun May 29 17:09:56 2016
config:

    NAME                                          STATE     READ WRITE CKSUM
    greenpool                                     ONLINE       0     0     0
      mirror-0                                    ONLINE       0     0     0
        ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5757832  ONLINE       0     0     0
        ata-WDC_WD20EARX-00PASB0_WD-WCAZA8848843  ONLINE       0     0     0
      mirror-1                                    ONLINE       0     0     0
        ata-WDC_WD20EARX-00PASB0_WD-WCAZA8841762  ONLINE       0     0     0
        ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5736249  ONLINE       0     0     0

errors: No known data errors

I can disable spindown/standby, but not all pools are always in use, some are only archives. I enabled standby timeouts before the IO errors came along, so to me, it sounds like ZFS or ZoL doesn't deal with spindown or standby very well?

Additional data:

Using version 0.6.5.6-2 and with Linux 4.5.0-2-amd64.

kobuki commented 6 years ago

None of the tricks helped me in this thread, either. The only thing I haven't tried is the kernel mod and compilation of the new driver. I'm not too worried about the false errors though, I'm more worried about disks getting thrown out of the pool and one day destroying it.

splitice commented 6 years ago

@kobuki Restarts at the power level when a drive gets dropped are the most reliable way of keeping this zfs + this HBA working.

Sigh id happily throw $200 towards a professional to patch this in the kernel. But being a professional software developer myself I know that's far too insignificant for the work involved in troubleshooting something this low level.

d-helios commented 6 years ago

@kobuki , @red-scorp , @chinesestunna Apologise for a Delayed Response. Here is diff https://bitbucket.org/d-helios/znstor_v2-ansible/commits/a0adbdefbf5d4156e6c0a08f41f7e26b19b73cf9#chg-roles/multipath/templates/multipath.conf.j2

The most important changes are:

        path_selector           "service-time 0"
        path_checker            "tur"

also you can find all parameters that I change, here: https://bitbucket.org/d-helios/znstor_v2-ansible

d-helios commented 6 years ago

As I understood this parameter mpt3sas.missing_delay=60,60 will lead to increased timeout from block layer in case of disk failure. During this period of time, i/o will be hang until operating system mark disk as failed. @red-scorp, am I right?

red-scorp commented 6 years ago

@d-helios I guess so. After this option no disks were thrown from my ZFS array but it did not fix sporadic I/O errors.

kobuki commented 6 years ago

FWIW, I do have "mpt3sas.missing_delay=60,60" in my current kernel cmdline yet disks got thrown out multiple times.

d-helios commented 6 years ago

@red-scorp I asked because I primary use zfs as NAS storage and when bad disks not mark as "bad" quick enough it lead to degradation of virtual machines that places on it. So setting high timeout it's not quit good idea.

and as I wrote perviously, Solaris on the same hardware doesn't have any problem with it. I'm going to replace on of NAS with Ubuntu version, will see how things going in production ))

red-scorp commented 6 years ago

@d-helios same usage for me and I use ubuntu 18.04. It doe not drop disks from my array but it gives I/O errors periodically. My solution was to use another controller and forget about the built-in SAS3008 chip. I've used both mpt3sas.msix_disable=1 mpt3sas.missing_delay=60,60 as a kernel option.

chinesestunna commented 6 years ago

Quick update from me, trying out different fix, disk APM settings using hdparm -B flag. Around same time of upgrading to 4.x kernel system (OMV4) I also added a startup script to configure the disks such as set NCQ, idle spin down time (60 min) etc on the disks. Among these commands is disk APM, reading the man pages and guides of using hdparm to set APM settings, I believe I misunderstood as most of the guides state APM values > 127 will not allow spin down. I definitely want my disks to spin down so I've set APM to 127, rather low from a performance standpoint. One anecdotal observation I have is that since upgrade array spin up from sleep seems "slow" and of course gets stuck with disk i/o are stuck. Anyways I think the low APM setting has an impact on how long it take a disk to spin up and respond to commands. I've tested higher APM settings and if a disk will still go to sleep and now am running at 254 which is in theory highest before APM is turned off. Disks still go to sleep after 60 min of idle which is good but I will report back if the issue continues

d-helios commented 6 years ago

@chinesestunna, can you post your log please. Did you try to enable debug ?

define MPT_DEBUG https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG 0x00000001

define MPT_DEBUG_MSG_FRAME https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_MSG_FRAME 0x00000002

define MPT_DEBUG_SG https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_SG 0x00000004

define MPT_DEBUG_EVENTS https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_EVENTS 0x00000008

define MPT_DEBUG_EVENT_WORK_TASK https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_EVENT_WORK_TASK 0x00000010

define MPT_DEBUG_INIT https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_INIT 0x00000020

define MPT_DEBUG_EXIT https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_EXIT 0x00000040

define MPT_DEBUG_FAIL https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_FAIL 0x00000080

define MPT_DEBUG_TM https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_TM 0x00000100

define MPT_DEBUG_REPLY https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_REPLY 0x00000200

define MPT_DEBUG_HANDSHAKE https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_HANDSHAKE 0x00000400

define MPT_DEBUG_CONFIG https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_CONFIG 0x00000800

define MPT_DEBUG_DL https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_DL 0x00001000

define MPT_DEBUG_RESET https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_RESET 0x00002000

define MPT_DEBUG_SCSI https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_SCSI 0x00004000

define MPT_DEBUG_IOCTL https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_IOCTL 0x00008000

define MPT_DEBUG_SAS https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_SAS 0x00020000

define MPT_DEBUG_TRANSPORT https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_TRANSPORT 0x00040000

define MPT_DEBUG_TASK_SET_FULL https://elixir.bootlin.com/linux/v4.5/ident/MPT_DEBUG_TASK_SET_FULL 0x00080000

On 29 Aug 2018, at 19:15, chinesestunna notifications@github.com wrote:

Quick update from me, trying out different fix, disk APM settings using hdparm -B flag. Around same time of upgrading to 4.x kernel system (OMV4) I also added a startup script to configure the disks such as set NCQ, idle spin down time (60 min) etc on the disks. Among these commands is disk APM, reading the man pages and guides of using hdparm to set APM settings, I believe I misunderstood as most of them state APM values > 127 will not allow spin down. I of course wanted my disks to spin down so I've set APM to 127, rather low from a performance standpoint. One anecdotal observation I have is that since upgrade array spin up from sleep seems "slow" and of course gets stuck with disk i/o are stuck. Anyways I think the low APM setting has an impact on how long it take a disk to spin up and respond to commands. I've tested higher APM settings and if a disk will still go to sleep and now am running at 254 which is in theory highest before APM is turned off. Disks still go to sleep after 60 min of idle which is good but I will report back if the issue continues

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/4713#issuecomment-417010416, or mute the thread https://github.com/notifications/unsubscribe-auth/AP2L3njHFmLlpIMKpAy9l4DerQbn1WB1ks5uVr41gaJpZM4IpYmi.

chinesestunna commented 6 years ago

@d-helios I have not tried enabling debug, I'll capture the syslog next time something seems wrong or drive drop for analysis. Generally it's similar to what others have posted here, except I have an expander that gets reset when it seems the disks take too long to wake up

chinesestunna commented 6 years ago

server VM has been running since Wednesday, so far the logs still show sprinkle of i/o read errors and one expander reset. Seems like same issues as before so it'll be a matter of time before a disk doesn't "respond" quickly enough and drops: [Wed Aug 29 19:29:02 2018] sd 0:0:0:0: [sda] tag#7925 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [Wed Aug 29 19:29:02 2018] sd 0:0:0:0: [sda] tag#7925 CDB: Read(16) 88 00 00 00 00 00 07 48 1c 60 00 00 00 08 00 00 [Wed Aug 29 19:29:02 2018] print_req_error: I/O error, dev sda, sector 122166368 [Thu Aug 30 10:32:01 2018] sd 0:0:0:0: [sda] tag#5293 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [Thu Aug 30 10:32:01 2018] sd 0:0:0:0: [sda] tag#5293 CDB: Read(16) 88 00 00 00 00 00 07 48 1c 60 00 00 00 08 00 00 [Thu Aug 30 10:32:01 2018] print_req_error: I/O error, dev sda, sector 122166368 [Thu Aug 30 10:58:35 2018] sd 0:0:7:0: [sdh] tag#5293 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [Thu Aug 30 10:58:35 2018] sd 0:0:7:0: [sdh] tag#5293 CDB: Read(16) 88 00 00 00 00 00 da 5b 6c 20 00 00 00 08 00 00 [Thu Aug 30 10:58:35 2018] print_req_error: I/O error, dev sdh, sector 3663424544 [Thu Aug 30 10:58:44 2018] sd 0:0:8:0: [sdi] tag#5293 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [Thu Aug 30 10:58:44 2018] sd 0:0:8:0: [sdi] tag#5293 CDB: Read(16) 88 00 00 00 00 00 00 00 97 f8 00 00 00 08 00 00 [Thu Aug 30 10:58:44 2018] print_req_error: I/O error, dev sdi, sector 38904 [Fri Aug 31 12:04:25 2018] sd 0:0:11:0: attempting device reset! scmd(000000002ce2b2ab) [Fri Aug 31 12:04:25 2018] sd 0:0:11:0: [sdl] tag#2877 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [Fri Aug 31 12:04:25 2018] scsi target0:0:11: handle(0x0015), sas_address(0x5001e677b7fb5ff1), phy(17) [Fri Aug 31 12:04:25 2018] scsi target0:0:11: enclosure logical id(0x5001e677b7fb5fff), slot(17) [Fri Aug 31 12:04:26 2018] sd 0:0:11:0: device reset: FAILED scmd(000000002ce2b2ab) [Fri Aug 31 12:04:26 2018] scsi target0:0:11: attempting target reset! scmd(000000002ce2b2ab) [Fri Aug 31 12:04:26 2018] sd 0:0:11:0: [sdl] tag#2877 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [Fri Aug 31 12:04:26 2018] scsi target0:0:11: handle(0x0015), sas_address(0x5001e677b7fb5ff1), phy(17) [Fri Aug 31 12:04:26 2018] scsi target0:0:11: enclosure logical id(0x5001e677b7fb5fff), slot(17) [Fri Aug 31 12:04:26 2018] sd 0:0:11:0: device_block, handle(0x0015) [Fri Aug 31 12:04:26 2018] scsi target0:0:11: target reset: SUCCESS scmd(000000002ce2b2ab) [Fri Aug 31 12:04:26 2018] sd 0:0:11:0: device_unblock and setting to running, handle(0x0015) [Fri Aug 31 12:04:27 2018] sd 0:0:11:0: Power-on or device reset occurred [Fri Aug 31 12:04:27 2018] mpt2sas_cm0: attempting host reset! scmd(000000002ce2b2ab) [Fri Aug 31 12:04:27 2018] sd 0:0:11:0: [sdl] tag#2877 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [Fri Aug 31 12:04:37 2018] mpt2sas_cm0: sending diag reset !! [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: diag reset: SUCCESS [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: LSISAS2308: FWVersion(20.00.07.00), ChipRevision(0x05), BiosVersion(07.39.02.00) [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: Protocol=( [Fri Aug 31 12:04:38 2018] Initiator [Fri Aug 31 12:04:38 2018] ,Target [Fri Aug 31 12:04:38 2018] ), [Fri Aug 31 12:04:38 2018] Capabilities=( [Fri Aug 31 12:04:38 2018] TLR [Fri Aug 31 12:04:38 2018] ,EEDP [Fri Aug 31 12:04:38 2018] ,Snapshot Buffer [Fri Aug 31 12:04:38 2018] ,Diag Trace Buffer [Fri Aug 31 12:04:38 2018] ,Task Set Full [Fri Aug 31 12:04:38 2018] ,NCQ [Fri Aug 31 12:04:38 2018] ) [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: sending port enable !! [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: port enable: SUCCESS [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for end-devices: start [Fri Aug 31 12:04:46 2018] scsi target0:0:0: handle(0x000a), sas_addr(0x5001e677b7fb5fe4) [Fri Aug 31 12:04:46 2018] scsi target0:0:0: enclosure logical id(0x5001e677b7fb5fff), slot(4) [Fri Aug 31 12:04:46 2018] scsi target0:0:1: handle(0x000b), sas_addr(0x5001e677b7fb5fe5) [Fri Aug 31 12:04:46 2018] scsi target0:0:1: enclosure logical id(0x5001e677b7fb5fff), slot(5) [Fri Aug 31 12:04:46 2018] scsi target0:0:2: handle(0x000c), sas_addr(0x5001e677b7fb5fe6) [Fri Aug 31 12:04:46 2018] scsi target0:0:2: enclosure logical id(0x5001e677b7fb5fff), slot(6) [Fri Aug 31 12:04:46 2018] scsi target0:0:3: handle(0x000d), sas_addr(0x5001e677b7fb5fe7) [Fri Aug 31 12:04:46 2018] scsi target0:0:3: enclosure logical id(0x5001e677b7fb5fff), slot(7) [Fri Aug 31 12:04:46 2018] scsi target0:0:4: handle(0x000e), sas_addr(0x5001e677b7fb5fe8) [Fri Aug 31 12:04:46 2018] scsi target0:0:4: enclosure logical id(0x5001e677b7fb5fff), slot(8) [Fri Aug 31 12:04:46 2018] scsi target0:0:5: handle(0x000f), sas_addr(0x5001e677b7fb5fe9) [Fri Aug 31 12:04:46 2018] scsi target0:0:5: enclosure logical id(0x5001e677b7fb5fff), slot(9) [Fri Aug 31 12:04:46 2018] scsi target0:0:6: handle(0x0010), sas_addr(0x5001e677b7fb5fea) [Fri Aug 31 12:04:46 2018] scsi target0:0:6: enclosure logical id(0x5001e677b7fb5fff), slot(10) [Fri Aug 31 12:04:46 2018] scsi target0:0:7: handle(0x0011), sas_addr(0x5001e677b7fb5feb) [Fri Aug 31 12:04:46 2018] scsi target0:0:7: enclosure logical id(0x5001e677b7fb5fff), slot(11) [Fri Aug 31 12:04:46 2018] scsi target0:0:8: handle(0x0012), sas_addr(0x5001e677b7fb5fee) [Fri Aug 31 12:04:46 2018] scsi target0:0:8: enclosure logical id(0x5001e677b7fb5fff), slot(14) [Fri Aug 31 12:04:46 2018] scsi target0:0:9: handle(0x0013), sas_addr(0x5001e677b7fb5fef) [Fri Aug 31 12:04:46 2018] scsi target0:0:9: enclosure logical id(0x5001e677b7fb5fff), slot(15) [Fri Aug 31 12:04:46 2018] scsi target0:0:10: handle(0x0014), sas_addr(0x5001e677b7fb5ff0) [Fri Aug 31 12:04:46 2018] scsi target0:0:10: enclosure logical id(0x5001e677b7fb5fff), slot(16) [Fri Aug 31 12:04:46 2018] scsi target0:0:11: handle(0x0015), sas_addr(0x5001e677b7fb5ff1) [Fri Aug 31 12:04:46 2018] scsi target0:0:11: enclosure logical id(0x5001e677b7fb5fff), slot(17) [Fri Aug 31 12:04:46 2018] scsi target0:0:12: handle(0x0016), sas_addr(0x5001e677b7fb5ff2) [Fri Aug 31 12:04:46 2018] scsi target0:0:12: enclosure logical id(0x5001e677b7fb5fff), slot(18) [Fri Aug 31 12:04:46 2018] scsi target0:0:13: handle(0x0017), sas_addr(0x5001e677b7fb5ff3) [Fri Aug 31 12:04:46 2018] scsi target0:0:13: enclosure logical id(0x5001e677b7fb5fff), slot(19) [Fri Aug 31 12:04:46 2018] scsi target0:0:14: handle(0x0018), sas_addr(0x5001e677b7fb5ff4) [Fri Aug 31 12:04:46 2018] scsi target0:0:14: enclosure logical id(0x5001e677b7fb5fff), slot(20) [Fri Aug 31 12:04:46 2018] scsi target0:0:15: handle(0x0019), sas_addr(0x5001e677b7fb5ff5) [Fri Aug 31 12:04:46 2018] scsi target0:0:15: enclosure logical id(0x5001e677b7fb5fff), slot(21) [Fri Aug 31 12:04:46 2018] scsi target0:0:16: handle(0x001a), sas_addr(0x5001e677b7fb5ff6) [Fri Aug 31 12:04:46 2018] scsi target0:0:16: enclosure logical id(0x5001e677b7fb5fff), slot(22) [Fri Aug 31 12:04:46 2018] scsi target0:0:17: handle(0x001b), sas_addr(0x5001e677b7fb5ff7) [Fri Aug 31 12:04:46 2018] scsi target0:0:17: enclosure logical id(0x5001e677b7fb5fff), slot(23) [Fri Aug 31 12:04:46 2018] scsi target0:0:18: handle(0x001c), sas_addr(0x5001e677b7fb5ffd) [Fri Aug 31 12:04:46 2018] scsi target0:0:18: enclosure logical id(0x5001e677b7fb5fff), slot(24) [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for end-devices: complete [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for end-devices: start [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for PCIe end-devices: complete [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for expanders: start [Fri Aug 31 12:04:46 2018] expander present: handle(0x0009), sas_addr(0x5001e677b7fb5fff) [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for expanders: complete [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: host reset: SUCCESS scmd(000000002ce2b2ab) [Fri Aug 31 12:04:56 2018] sd 0:0:11:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:0:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:1:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:2:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:3:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:4:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:5:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:6:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:7:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:8:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:9:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:10:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:12:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:13:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:14:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:15:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:16:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:17:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: removing unresponding devices: start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: removing unresponding devices: end-devices [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: Removing unresponding devices: pcie end-devices [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: removing unresponding devices: expanders [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: removing unresponding devices: complete [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: expanders start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: break from expander scan: ioc_status(0x0022), loginfo(0x310f0400) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: expanders complete [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: end devices start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: break from end device scan: ioc_status(0x0022), loginfo(0x310f0400) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: end devices complete [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: pcie end devices start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: break from pcie end device scan: ioc_status(0x0022), loginfo(0x3003011d) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: pcie devices: pcie end devices complete [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: complete

ghost commented 4 years ago

Replicating this in 4.14 with mpt2sas too. Using 0.8.1 (latest revision stable).

zrav commented 4 years ago

Seeing this in 5.3 with mpt3sas version 29.100, SAS2008 with firmware P20, ZoL 0.8.3.

kobuki commented 4 years ago

Interestingly enough, I don't see it any more on 5.2 (Debian 10.1, 5.2.9-2~bpo10+1), mpt3sas 28.100.00.00 (sorry, no idea of the FW version), zfs 0.8.2 (zfs-0.8.2-2~bpo10+1). Any version before 5.2 showed those messages, causing read errors in the pool, kicking disks, etc. With this setup all such errors disappeared and no errors ever in zpool status after the regular scrub.

cwalv2 commented 4 years ago

This issue still seems to happen on random read/write stuff. I got a 9207-8i with P20 firmware, Ubuntu 19.10, 5.3.0-40-generic. I bought the HBA to try to fix an error I was getting with the motherboard onboard SATA: ASRock X570 PRO4

I have a suspicion it's the hard drives somehow. It's very easy to get the error to happen by using either fio or rsyncing a large directory.

I've attached error logs for both the built in SATA as well as the SAS HBA... About to try rebooting with mpt3sas.msix_disable=1 after the thousandth scrub of this pool...

Honestly, I've had pretty terrible experiences with zfs so far. It really doesn't like USB hard drive pools either, I think the second log has a couple instances of one of those crashing as well. (I have a 3-way mirror external HD pool I was using to hold files while transferring data) Maybe that's just the USB drive sucks. It's always the same one that seems to drop out, sooo..

zfs-crash-fio.log newcrash.log

zrav commented 4 years ago

FWIW, I've not had the issue anymore with mpt3sas 33.100, kernel >=5.4.0-31 (Ubuntu 20.04), on current ZoL master.

malventano commented 4 years ago

FWIW, I've not had the issue anymore with mpt3sas 33.100, kernel >=5.4.0-31 (Ubuntu 20.04), on current ZoL master.

I'm reliably reproducing a similar issue. Easy way for me to repro:

Steps taken so far for troubleshooting:

Errors are consistently repeatable across all above configurations and occur ~7-9 seconds after a request is sent to a sleeping device. This happens regardless of any of the above changes being made. Here's an example of the errors received whenever a read request hits a sleeping drive:

[  244.706514] sd 4:0:47:2: [sdbo] tag#8755 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  244.706982] sd 4:0:47:2: [sdbo] tag#8755 Sense Key : Aborted Command [current] [descriptor]
[  244.707446] sd 4:0:47:2: [sdbo] tag#8755 <<vendor>>ASC=0x98 ASCQ=0x6
[  244.707909] sd 4:0:47:2: [sdbo] tag#8755 CDB: Read(16) 88 00 00 00 00 03 5b 7f 91 20 00 00 00 08 00 00
[  244.708372] blk_update_request: I/O error, dev sdbo, sector 14419988768 op 0x0:(READ) flags 0x4700 phys_seg 1 prio class 0

Multipath should not be the cause here as the errors occur even without multipath. If multipath is in use, it will immediately drop the associated link when either error occurs. All relevant multipath timeouts (dev_loss_tmo , checker_timeout) are set to 30. This happens regardless of path_selector mode. Sometimes no paths drop, sometimes one, sometimes both. Below is an example of multipath in use and both paths dropping when a data request attempts to wake a drive:

kernel: [127941.031414] sd 0:0:29:2: [sdah] tag#7255 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: [127941.031434] sd 0:0:50:2: [sdbu] tag#3092 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: [127941.032084] sd 0:0:29:2: [sdah] tag#7255 Sense Key : Aborted Command [current] [descriptor] 
kernel: [127941.032713] sd 0:0:50:2: [sdbu] tag#3092 Sense Key : Aborted Command [current] [descriptor] 
kernel: [127941.033333] sd 0:0:29:2: [sdah] tag#7255 <<vendor>>ASC=0x98 ASCQ=0x1 
kernel: [127941.033947] sd 0:0:50:2: [sdbu] tag#3092 <<vendor>>ASC=0x98 ASCQ=0x1 
kernel: [127941.034538] sd 0:0:29:2: [sdah] tag#7255 CDB: Read(16) 88 00 00 00 00 02 45 e7 38 d8 00 00 06 f8 00 00
kernel: [127941.035095] sd 0:0:50:2: [sdbu] tag#3092 CDB: Read(16) 88 00 00 00 00 02 45 e7 32 c0 00 00 00 98 00 00
Jun 21 00:08:26 BB-8 kernel: [127941.035655] blk_update_request: I/O error, dev sdah, sector 9762715864 op 0x0:(READ) flags 0x4700 phys_seg 17 prio class 0
Jun 21 00:08:26 BB-8 kernel: [127941.036201] blk_update_request: I/O error, dev sdbu, sector 9762714304 op 0x0:(READ) flags 0x4700 phys_seg 3 prio class 0
multipathd[3995]: sdbu: mark as failed
multipathd[3995]: h10: remaining active paths: 1
kernel: [127941.037334] device-mapper: multipath: Failing path 68:128.
kernel: [127941.037973] sd 0:0:29:2: [sdah] tag#2977 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: [127941.038516] sd 0:0:29:2: [sdah] tag#2977 Sense Key : Aborted Command [current] [descriptor] 
kernel: [127941.039046] sd 0:0:29:2: [sdah] tag#2977 <<vendor>>ASC=0x98 ASCQ=0x6 
kernel: [127941.039571] sd 0:0:29:2: [sdah] tag#2977 CDB: Read(16) 88 00 00 00 00 02 45 e7 32 c0 00 00 00 98 00 00
kernel: [127941.040083] blk_update_request: I/O error, dev sdah, sector 9762714304 op 0x0:(READ) flags 0x4700 phys_seg 3 prio class 0
kernel: [127941.040597] device-mapper: multipath: Failing path 66:16.
multipath: dm-40: no usable paths found
multipathd[3995]: sdah: mark as failed
multipathd[3995]: h10: Entering recovery mode: max_retries=10
multipathd[3995]: h10: remaining active paths: 0
multipathd[3995]: h10: sdbu - tur checker reports path is up
multipathd[3995]: 68:128: reinstated
multipathd[3995]: h10: queue_if_no_path enabled
multipathd[3995]: h10: Recovered to normal mode
multipathd[3995]: h10: remaining active paths: 1
kernel: [127945.199129] device-mapper: multipath: Reinstating path 68:128.
pvestatd[30819]: zfs error: cannot open 'rpool/data': dataset does not exist
multipathd[3995]: h10: sdah - tur checker reports path is up
multipathd[3995]: 66:16: reinstated
multipathd[3995]: h10: remaining active paths: 2
kernel: [127951.204478] device-mapper: multipath: Reinstating path 66:16.

The 4246's also throw a sd x:x:x:x: Power-on or device reset occurred in addition to the above. Here is an example where the error was triggered by a smartctl -x request (no sector read errors - only reading smart data):

kernel: [  133.880657] sd 3:0:3:0: Power-on or device reset occurred
kernel: [  133.880798] sd 3:0:54:0: Power-on or device reset occurred
kernel: [  134.421967] device-mapper: multipath: Failing path 67:48.
kernel: [  134.422174] device-mapper: multipath: Failing path 8:48.
multipathd[2569]: mpathi: sdaz - tur checker reports path is down
multipathd[2569]: checker failed path 67:48 in map mpathi
multipathd[2569]: mpathi: remaining active paths: 1
multipathd[2569]: mpathi: sdd - tur checker reports path is down
multipathd[2569]: checker failed path 8:48 in map mpathi
multipathd[2569]: mpathi: Entering recovery mode: max_retries=2
multipathd[2569]: mpathi: remaining active paths: 0
multipath: dm-35: no usable paths found
multipathd[2569]: 67:48: reinstated
multipathd[2569]: mpathi: sdd - tur checker reports path is up
multipathd[2569]: 8:48: reinstated
multipathd[2569]: mpathi: queue_if_no_path enabled
multipathd[2569]: mpathi: Recovered to normal mode
multipathd[2569]: mpathi: remaining active paths: 1
kernel: [  136.416937] device-mapper: multipath: Reinstating path 67:48.
kernel: [  136.417380] device-mapper: multipath: Reinstating path 8:48.

(note: I had changed polling_interval to 2 for this example. I was 10 for the earlier examples. This is why the path came back after 2 / 10 secs respectively).

So far I'm stumped on this one...

richardelling commented 4 years ago

Your drive is aborting the command and returning ASC/ASCQ 0x98/0x6 or 0x98/0x1. These do not seem to be registered with T10 at https://www.t10.org/lists/asc-num.htm Contact the drive vendor for more information.

malventano commented 4 years ago

That may be the Netapp 4486 enclosure/sleds issuing those aborts. It has dual drive sleds which have a pair of SATA (He12) behind a Marvell 88SF9210, which bridges to SAS multipath. The other enclosure I tried was a Netapp 4246, which shouldn't have as much 'in the way', but I didn't do read tests to trigger the errors in that config (only experienced the resets at the same 7-8 seconds after requests, in that case smartctl -x). I'll try to trigger some read timeouts and see what happens there.

hoppel118 commented 4 years ago

I am still on Debian Stretch. I have to update my os to buster and check if the problem is solved with the latest versions. That will take „some“ time. ;)

Regards Hoppel

QBANIN commented 4 years ago

I am still on Debian Stretch. I have to update my os to buster and check if the problem is solved with the latest versions. That will take „some“ time. ;)

Regards Hoppel

It's not solved. Just happened to me last night. Debian Buster, kernel 5.4.41, LSI 9207-8i

hoppel118 commented 4 years ago

Hm.... Bummer...

bsdice commented 3 years ago

(This is a copy&paste from my comment in issue #4638 just in case someone finds this issue through a search engine, looking for a workaround)

I got hit by this problem as well, running ZFS 0.8.4 on Linux 5.4 (Arch Linux LTS-kernel) with eight 14 TB SATA disks in RAIDZ2 behind an LSI 2308 controller flashed to IT mode. Whenever I turn on hd-idle and let it spin down the disks (they sit idle 20h per day), ZFS will complain loudly in kernel log during wakeup. After a couple of days of testing many read and, most worringly, also write and even checksum errors occurred (zpool status). Scrub could correct all problems, but this needed to be fixed asap.

I solved the problem by doing away with the LSI and buying a JMicron JMB585 5-port SATA controller card instead. These chips exist since about 2018 so relatively new. No extra driver is needed, the card will run with any even remotely recent stock AHCI driver. Since the switch no more errors have occurred at all, even though I aggressively put disks into standby when not in use. As far as I can see the card also has no PCIe bottleneck, because it can use PCIe 3.0 with two lanes, supposedly reaching 1700 MByte/s transfer rates. Should be good enough for 5 modern HDDs. There are mostly chinese no-names out there, US$ 30-40 in 2020, I recommend getting a card with largish black heatsink though, to preclude thermal issues. There appear to be no electrolytic capacitors on these cards, so these might even be very long term stable (10+years).

RichieB2B commented 3 years ago

I'm having the same problem with the onboard SAS3008 of my Supermicro X11SSL-CF motherboard. I'm on Debian Buster with zfs 0.8.6-1~bpo10+1 from buster-backports. One thing I plan to do is flash the SAS3008 to P16-V17 firmware which is the latest posted by Supermicro.

# sas3ircu 0 DISPLAY
Avago Technologies SAS3 IR Configuration Utility.
Version 17.00.00.00 (2018.04.02) 
Copyright (c) 2009-2018 Avago Technologies. All rights reserved. 

Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
  Controller type                         : SAS3008
  BIOS version                            : 8.29.01.00
  Firmware version                        : 12.00.02.00
  Channel description                     : 1 Serial Attached SCSI
  Initiator ID                            : 0
  Maximum physical devices                : 255
  Concurrent commands supported           : 3072
  Slot                                    : 2
  Segment                                 : 0
  Bus                                     : 2
  Device                                  : 0
  Function                                : 0
  RAID Support                            : No
------------------------------------------------------------------------
# modinfo mpt3sas
filename:       /lib/modules/4.19.0-13-amd64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko
alias:          mpt2sas
version:        26.100.00.00
license:        GPL
description:    LSI MPT Fusion SAS 3.0 Device Driver
kobuki commented 3 years ago

@RichieB2B the problems disappeared for me on 5.8 or newer kernels from backports. I suggest giving them a try.

RichieB2B commented 3 years ago

Thanks @kobuki I upgraded the firmware to P16 and Linux kernel to 5.9.0-0.bpo.5-amd64 from buster-backports and I have not seen the errors since.

# sas3ircu 0 DISPLAY
Avago Technologies SAS3 IR Configuration Utility.
Version 17.00.00.00 (2018.04.02) 
Copyright (c) 2009-2018 Avago Technologies. All rights reserved. 

Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
  Controller type                         : SAS3008
  BIOS version                            : 8.37.00.00
  Firmware version                        : 16.00.10.00
  Channel description                     : 1 Serial Attached SCSI

# modinfo mpt3sas
filename:       /lib/modules/5.9.0-0.bpo.5-amd64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko
alias:          mpt2sas
version:        34.100.00.00
license:        GPL
description:    LSI MPT Fusion SAS 3.0 Device Driver
crimp42 commented 3 years ago

Based on positive result to what I have been seeing in this thread I decided to give it a try.

Yes the problem is gone from me running Ubuntu with kernel 5.9.0-050900-generic.

But none of my drives go to sleep now on my LSI 9211 controller.

So at least for me it is no different than running a different distro with an older kernel version if I just keep my drives awake I never see those errors.

Now even though I set my drives to fall asleep, they never seem to. So of course none of these errors.

At least that was my experience.

On Thu, Feb 11, 2021 at 3:13 PM Richie B2B @.***> wrote:

Thanks @kobuki https://github.com/kobuki I upgraded the firmware to P16 and Linux kernel to 5.9.0-0.bpo.5-amd64 from buster-backports and I have not seen the errors since.

sas3ircu 0 DISPLAY

Avago Technologies SAS3 IR Configuration Utility. Version 17.00.00.00 (2018.04.02) Copyright (c) 2009-2018 Avago Technologies. All rights reserved.

Read configuration has been initiated for controller 0

Controller information

Controller type : SAS3008 BIOS version : 8.37.00.00 Firmware version : 16.00.10.00 Channel description : 1 Serial Attached SCSI

modinfo mpt3sas

filename: /lib/modules/5.9.0-0.bpo.5-amd64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko alias: mpt2sas version: 34.100.00.00 license: GPL description: LSI MPT Fusion SAS 3.0 Device Driver

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/4713#issuecomment-777794740, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD2A5V4BQSRBRVKLPRI5IDS6RB7JANCNFSM4CFFRGRA .

jonathan-molyneux commented 3 years ago

Had good results running 5.11.8, so far much better than 5.4.86. Time will tell, but it's through a full scrub without a single read error.

Running P19 on the LSI SAS 9201-16i & SAS 9211-8i controllers. If you have the same controllers do not run P20, it results in read & write errors.

Thanks for the updates @kobuki @RichieB2B and @brianmduncan.

red-scorp commented 3 years ago

@jonathan-molyneux : Scrubbing was not a problem. Waking up of derives was always an issue. Let your drives sleep and see whether they waking up good or not.

xes commented 3 years ago

Maybe not related with your problems ....but if you are using Supermicro servers + LSI 3008 + sas3ircu, please check your backplane (BPN-SAS3-216EL1) firmware: https://www.supermicro.com/support/faqs/faq.cfm?faq=33592

nerozero commented 3 years ago

I have the same issue on FreeBSD for years. Fixable most times by disabling hard drive APM (advanced power management) and/or EPC (extended power conditions) options. For some reasons some of my SSD doesn't seems to have EPC capabilities so they falling off 2-5 times a day.

A really good How-to disable/modify drive power management can be found here: https://serverfault.com/questions/1047331/how-do-i-disable-hard-disk-spin-down-or-head-parking-in-freebsd

But still. disabling apm/epc doesn't count as valid solution.

crsleeth commented 2 years ago

I'm seeing this issue on RHEL 8.5 4.18.0-348.7.1.el8_5 with LSI 9300-8e HBAs and SuperMicro 846 shelves. Heavy IO to the drives will slowly increase the read/write errors in zpool status until ZFS steps in and starts resilvering. For the amount of drives I have in RAIDZ2 resilvering daily isn't really feasible because it takes longer than a day to resilver.

I need to update my 9300-8e's (SAS 3008) firmware but haven't been able to yet.

Numline1 commented 2 years ago

Installing linux-generic-hwe-20.04 (upgrades the kernel from 5.4.x to 5.11.x) on my Ubuntu 20.04.3 fixed the issue (I think, haven't seen an error after resilver and scrub). I still have no idea what exactly changed in these versions that caused this to magically fix itself. I was actually getting errors in zpool status & had a degraded array, always with the same physical disk, dmesg output was similar to what's been posted earlier with something along the lines of "waiting for a device/timeout".

I've tried swapping the SAS2SATA cable (but kept the position of the drive on the same cable #, which might be why). And I'm also using an LSI card, which might've been doing something funky with the driver in older kernels.

Either way, the problem is hopefully solved, I'm just curious as to why.