openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.32k stars 1.71k forks source link

ZFS io error when disks are in idle/standby/spindown mode #4713

Closed johnkeates closed 5 years ago

johnkeates commented 8 years ago

Whenever one or more disks in one of my pools is sleeping because it was idle ZFS (via ZED) spams me with IO errors (via email, because that's how I set it up).

It's always this kind of error with only the vpath, vguid and eid changing:

ZFS has detected an io error:

  eid: 15
class: io
 host: clava
 time: 2016-05-29 23:21:18+0200
vtype: disk
vpath: /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5736249-part1
vguid: 0x0094F35F53B1888B
cksum: 0
 read: 0
write: 0
 pool: green pool

dmesg shows:

[ 3647.748383] sd 1:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748386] sd 1:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 64 9d ac c8 00 00 08 00
[ 3647.748388] blk_update_request: I/O error, dev sdd, sector 1688054984
[ 3647.748401] sd 1:0:1:0: [sde] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748402] sd 1:0:1:0: [sde] tag#1 CDB: Read(10) 28 00 b4 26 aa 70 00 00 08 00
[ 3647.748403] blk_update_request: I/O error, dev sde, sector 3022432880
[ 3647.748408] sd 1:0:3:0: [sdg] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748409] sd 1:0:3:0: [sdg] tag#2 CDB: Read(10) 28 00 b4 26 ca 78 00 00 08 00
[ 3647.748410] blk_update_request: I/O error, dev sdg, sector 3022441080
[ 3655.074695] sd 1:0:2:0: [sdf] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3655.074699] sd 1:0:2:0: [sdf] tag#8 CDB: Read(10) 28 00 64 9d b8 c0 00 00 08 00
[ 3655.074700] blk_update_request: I/O error, dev sdf, sector 1688058048
[ 3655.074712] sd 1:0:2:0: [sdf] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3655.074713] sd 1:0:2:0: [sdf] tag#10 Sense Key : Not Ready [current] 
[ 3655.074715] sd 1:0:2:0: [sdf] tag#10 Add. Sense: Logical unit not ready, initializing command required
[ 3655.074716] sd 1:0:2:0: [sdf] tag#10 CDB: Read(10) 28 00 64 9d 80 e8 00 00 08 00
[ 3655.074717] blk_update_request: I/O error, dev sdf, sector 1688043752
[ 3655.074721] sd 1:0:2:0: [sdf] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3655.074722] sd 1:0:2:0: [sdf] tag#13 Sense Key : Not Ready [current] 
[ 3655.074723] sd 1:0:2:0: [sdf] tag#13 Add. Sense: Logical unit not ready, initializing command required
[ 3655.074724] sd 1:0:2:0: [sdf] tag#13 CDB: Read(10) 28 00 64 9d 90 60 00 00 08 00
[ 3655.074725] blk_update_request: I/O error, dev sdf, sector 1688047712

Scrubbing gives no 'fixes', as there is no data corruption or pools that get unhappy, just a few errors. As far as I can see, either ZFS isn't waiting long enough for the disks to spin up (they actually spin up on access), or tries some command before checking the disk is ready for it.

The pool status:

john@clava:~$ sudo zpool status green pool
  pool: green pool
 state: ONLINE
  scan: scrub repaired 0 in 19h28m with 0 errors on Sun May 29 17:09:56 2016
config:

    NAME                                          STATE     READ WRITE CKSUM
    greenpool                                     ONLINE       0     0     0
      mirror-0                                    ONLINE       0     0     0
        ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5757832  ONLINE       0     0     0
        ata-WDC_WD20EARX-00PASB0_WD-WCAZA8848843  ONLINE       0     0     0
      mirror-1                                    ONLINE       0     0     0
        ata-WDC_WD20EARX-00PASB0_WD-WCAZA8841762  ONLINE       0     0     0
        ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5736249  ONLINE       0     0     0

errors: No known data errors

I can disable spindown/standby, but not all pools are always in use, some are only archives. I enabled standby timeouts before the IO errors came along, so to me, it sounds like ZFS or ZoL doesn't deal with spindown or standby very well?

Additional data:

Using version 0.6.5.6-2 and with Linux 4.5.0-2-amd64.

behlendorf commented 8 years ago

@johnkeates thanks for filing this. I spent a little time looking at this so let me post a possible solution to resolve this. This problem should only impact devices which Linux is performing explicitly power management on. Drives which spin up/down due to internal power management shouldn't see this issue.

We're going to need to add some code to check the request_queue->rpm_status to determine when a device has been suspended. If the block device was suspended then it needs to be woken before any new bio's are submitted, this can be done by dispatching a normal read request (say for the label). Once the request_queue->rpm_status state has transitioned back to active new bio's can be submitted.

antst commented 7 years ago

Same issue with SATA disks behind SAS controller. But it is not only about errors in logs. With 6 drives it takes quite some time to spin-up all drives, because spin-up is sequential. Maybe it is also reasonable to issue parallel wake to all devices in the pool on discovery of standby device, rather than discover them in standby and waking one by one.

Just to illustrate, on my home nas with 6 SATA disks behind mpt3sas (IT mode), it takes 9sec to wake them all in parallel (hdparm --read-sector in paralllel, till I get sector from all of them), and 44 sec to wake sequentially as it happens with ZFS now.

9 vs 44 is quite a difference.

hoppel118 commented 7 years ago

Hey guys,

the same for me, have a look at these issues:

https://github.com/zfsonlinux/zfs/issues/3785 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1504909?comments=all

Here I reported some things about my situation and my environment:

https://github.com/zfsonlinux/zfs/issues/4638

After hdd standby I also have to wait for a about one minute. I use 8x4tb wd red in a raid-z2 behind a lsi sas3008 (mpt3sas).

Greetings Hoppel118

inkdot7 commented 7 years ago

Spinning many drives up in parallel may be very unfriendly towards the power supply. To make matters worse, I think it is not uncommon for drive motors to use the +5 V line, which on most (consumer) PSUs has rather low current ratings. Better RAID cards are usually careful during boot to stagger drive spin-up, often by a some seconds at least.

rubictus commented 7 years ago

Same issue too with SATA disks behind a LSI Logic SAS2008, running CentOS 7.1, when disks are in stand-by mode. I always just trying to read some data.

uname -a

Linux prolib 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

modinfo zfs

filename: /lib/modules/3.10.0-514.6.1.el7.x86_64/extra/zfs.ko version: 0.6.5.9-1 license: CDDL author: OpenZFS on Linux description: ZFS rhelversion: 7.3 srcversion: 42C4AB70887EA26A9970936

dmesg

109797.582453] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [109797.582458] sd 0:0:4:0: [sde] CDB: Write(10) 2a 00 5c 00 2a 60 00 00 08 00 [109797.582460] blk_update_request: I/O error, dev sde, sector 1543514720 [109797.582476] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [109797.582477] sd 0:0:5:0: [sdf] CDB: Write(10) 2a 00 5c 00 2a 60 00 00 08 00 [109797.582478] blk_update_request: I/O error, dev sdf, sector 1543514720 [109807.394117] sd 0:0:2:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [109807.394121] sd 0:0:2:0: [sdc] CDB: Write(10) 2a 00 82 00 2a 60 00 00 08 00 [109807.394123] blk_update_request: I/O error, dev sdc, sector 2181048928 [109807.394140] sd 0:0:3:0: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [109807.394141] sd 0:0:3:0: [sdd] CDB: Write(10) 2a 00 82 00 2a 60 00 00 08 00 [109807.394142] blk_update_request: I/O error, dev sdd, sector 2181048928

zpool events

Feb 16 2017 21:23:05.232852408 ereport.fs.zfs.io Feb 16 2017 21:23:05.237852445 ereport.fs.zfs.io Feb 16 2017 21:23:05.237852445 ereport.fs.zfs.io Feb 16 2017 21:23:05.250852543 ereport.fs.zfs.io Feb 16 2017 21:23:05.265852655 ereport.fs.zfs.io Feb 16 2017 21:23:05.265852655 ereport.fs.zfs.io

zpool events -v

Feb 16 2017 21:23:05.265852655 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df94b0f49700401 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0xc288a8df9547ced5 vdev = 0x3a52042ed20235f2 (end detector) pool = "vol0" pool_guid = 0xc288a8df9547ced5 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0x3a52042ed20235f2 vdev_type = "raidz" vdev_ashift = 0xc vdev_complete_ts = 0x0 vdev_delta_ts = 0x0 vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xc288a8df9547ced5 parent_type = "root" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x80080 zio_stage = 0x200000 zio_pipeline = 0x210000 zio_delay = 0x0 zio_timestamp = 0x0 zio_delta = 0x0 zio_offset = 0x41000132000 zio_size = 0x200 zio_objset = 0x31 zio_object = 0x0 zio_level = 0xffffffffffffffff zio_blkid = 0x0 time = 0x58a60a29 0xfd896ef eid = 0x22

Jimmy-Z commented 7 years ago

I'm having this issue too, so there is no bad consequences despite the terrifying IO error mail?

rschulman commented 7 years ago

Is there any further news on this issue? I replaced one of the drives in my pool and now have been getting roughly hourly emails from root complaining of an IO error. Anyway to quiet this error down, if it hasn't been fixed yet, but still receive actual errors?

crimp42 commented 7 years ago

I am also really curious if someone figured out what is going on with this. I don't even use ZFS and am experiencing the same errors (got to this thread via google) I assume it is kernel/ dependent. I just recently bought several LSI SAS2008 controllers and was running CentOS 7.3 and was seeing these errors in my logs every so often. Finally narrowed it down to it being any SATA drive that was in sleep mode that was being accessed and while it was waking the errors would occur. Switched my (CentOS 7.3) boot drive out and went to CentOS 6.8 and the errors never occur. These particular errors seem to go back years, and it looks like no progress has been made to fix this issue? I did find one thread where people with errors switched the firmware on their LSI SAS2008 controllers back down to P19 and were OK. (I am on 20) I will probably try that next.

crimp42 commented 7 years ago

I tried P19 but had the same results, I also tried CentOS 6.9 and do not have the issue. The only difference I can see at this point is Kernel 2.X vs 3.x/4.x, I also tried on CentOS 7.3 to install Kernel 4.x and the problem was still there. So I would assume those of you with this issue with ZFS would NOT see this problem on a 2.x based kernel Linux Distro using the LSI SAS2008 with SATA drives.

Again my issue is that ANY SATA drive connected to the LSI SAS2008 (running P19 or P20) when the drive is asleep, access to the drive through the OS would create /var/log/messages errors similar to those listed above:

[ 3647.748383] sd 1:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 3647.748386] sd 1:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 64 9d ac c8 00 00 08 00

I also noticed on some of my drives that spin up a tad quicker, sometimes the error does not get generated.

crimp42 commented 7 years ago

One other thing I checked was the version of mpt2sas. (modinfo mpt2sas) On CentOS 6.9 where I am unable to reproduce the issue it is version 20.102.00.00, and on CentOS 7.3, where I am able to reproduce this issue, it is the same version 20.102.00.00. Only difference at this point that I can see is kernel 2.x (where it does not occur) and 3.x/4.x where it occurs.

johnkeates commented 7 years ago

I don't have any hardware that 2.x runs on, but I have the same with comparable mtp2sas versions. Doesn't seem to matter what version I use.

crimp42 commented 7 years ago

John, what HBA(s) model are you using? I assume SATA drives on a SAS/SATA controller?

johnkeates commented 7 years ago

Mostly SAS2008 controllers, all with SATA drives. I have one with SAS drives, and that one is not giving any trouble.

crimp42 commented 7 years ago

John, do you know what firmware level you are running with your HBA's?

And I am not sure if you all know, but if you disable drive spin down, the errors never occur for me on problematic 3.x/4.x kernel, so I assume the same would hold true for ZFS. Not ideal, and a waste of money if you have a large # of disks (reason I went back to 2.x kernel)

Most of the time when people report this issue with SAS2008 based HBA's and SATA drives, everyone assumes cable issues or power supply issues. I wonder if that is why no one seems to get to the bottom of this combination issue.

johnkeates commented 7 years ago

I solved it in a few ways:

All of them work. But the issue remains: ZFS should not be sad because a disk went in standby.

I'm mostly running this combination:

LSISAS2308: FWVersion(15.00.00.00), ChipRevision(0x05), BiosVersion(07.29.00.00) and LSISAS2008: FWVersion(15.00.00.00), ChipRevision(0x03), BiosVersion(07.29.00.00)

crimp42 commented 7 years ago

So basically your just keeping your drives from falling asleep, to avoid the errors.

One thing I have seen from numerous threads out there, is that no one has reported issues (any corruption) from this at least on a JBOD level. Just the annoying fact that they are always there if you like to save power/noise/heat. Not sure how it would impact ZFS since it is outside its control. It also seems like it is specific to LSI, I have seen multiple types of HBA's that are all LSI based where people report this.

I increased verbosity on SCSI logging on one box I have the 3.x kernel on where it errors out when the drive is accessed, and another where I have the 2.x kernel, where it works fine. In both cases they are drives on a LSI 9211-8i controller with P19 firmware running in IT mode, the drive was put into standby, then I just refreshed the directory to force it to read from the drive.

BOTH kernels are using the 20.102.00.00 mpt2sas module.

3.x kernel: (3.10.0-514.26.2.el7.x86_64) Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff88015595f880 Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 00 00 0e 70 00 00 00 08 00 00 Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 00 00 0e 70 00 00 00 08 00 00 Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Sense Key : Not Ready [current] Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Add. Sense: Logical unit not ready, initializing command required Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff88015595f880 Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Start/Stop Unit 1b 00 00 00 01 00 Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff88015595f880 Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Test Unit Ready 00 00 00 00 00 00 Jul 8 15:24:56 misc02 kernel: blk_update_request: I/O error, dev sdm, sector 3696 Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff8800d85f0fc0 Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 00 00 0e 70 00 00 00 08 00 00 Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff8800d85f1180 Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 36 c3 cb 40 00 00 00 10 00 00 Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff8800d85f1340 Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 15 e7 68 c0 00 00 00 10 00 00 Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff8800d85f1500

2.x kernel: (2.6.32-696.3.2.el6.x86_64) Jul 8 15:29:51 misc01 kernel: sd 1:0:0:0: [sdh] Send: Jul 8 15:29:51 misc01 kernel: sd 1:0:0:0: [sdh] CDB: Read(10): 28 00 42 40 61 40 00 00 10 00 Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] Send: Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] CDB: Read(10): 28 00 52 ac c5 10 00 00 10 00 Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] Send: Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] CDB: Read(10): 28 00 00 00 ed f0 00 00 10 00 Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] Send: Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] CDB: Read(10): 28 00 00 00 2c 80 00 00 10 00

hoppel118 commented 6 years ago

Hey guys,

I am not sure about the comment and milestone from @behlendorf at 31 May 2016:

Is this issue solved with 0.7.0? When will version 0.7.0 be released for debian jessie?

I am using openmediavault3 (debian jessie) with the following zol version:

root@omv:~# apt-cache depends --important openmediavault-zfs
openmediavault-zfs
 |Hängt ab von: linux-headers-amd64
 |Hängt ab von: pve-headers
  Hängt ab von: linux-headers-3.16.0-4-all
  Hängt ab von: openmediavault
  Hängt ab von: openmediavault-omvextrasorg
  Hängt ab von: zfs-dkms
 |Hängt ab von: zfsutils-linux
  Hängt ab von: <zfsutils>
    zfsutils-linux
  Hängt ab von: zfs-zed
root@omv:~# apt-cache depends --important pve-headers
pve-headers
  Hängt ab von: pve-headers-4.4.59-1-pve
root@omv:~# apt-cache policy pve-headers
pve-headers:
  Installiert:           4.4-87
  Installationskandidat: 4.4-87
  Versionstabelle:
 *** 4.4-87 0
        100 /var/lib/dpkg/status
root@omv:~# apt-cache depends --important zfs-dkms
zfs-dkms
  Hängt ab von: dkms
  Hängt ab von: lsb-release
 |Hängt ab von: debconf
  Hängt ab von: <debconf-2.0>
    cdebconf
    debconf
  Hängt ab von (vorher): spl-dkms
root@omv:~# apt-cache policy zfs-dkms
zfs-dkms:
  Installiert:           0.6.5.9-2~bpo8+1
  Installationskandidat: 0.6.5.9-2~bpo8+1
  Versionstabelle:
 *** 0.6.5.9-2~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache depends --important zfsutils-linux
zfsutils-linux
  Hängt ab von: init-system-helpers
  Hängt ab von: libblkid1
  Hängt ab von: libc6
  Hängt ab von: libnvpair1linux
  Hängt ab von: libuuid1
  Hängt ab von: libuutil1linux
  Hängt ab von: libzfs2linux
  Hängt ab von: libzpool2linux
  Hängt ab von: zlib1g
  Hängt ab von: python3
root@omv:~# apt-cache policy zfsutils-linux
zfsutils-linux:
  Installiert:           0.6.5.9-2~bpo8+1
  Installationskandidat: 0.6.5.9-2~bpo8+1
  Versionstabelle:
 *** 0.6.5.9-2~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~#

Greetings Hoppel

0xFelix commented 6 years ago

Hey,

I'm also still having the same problems like @hoppel118.

@behlendorf: Is there any progress for the solution mentioned in https://github.com/zfsonlinux/zfs/issues/4713#issuecomment-222776312 ?

The errors in the kernel log and the constant CHECKSUM errors I'm getting are rather frightening.

crimp42 commented 6 years ago

Just wanted to leave another note regarding this.

As I said previously this issue does NOT occur with CentOS 6.9 (which uses kernel 2.x) So I assume if you can use ZFS under this environment, this issue would not occur.

I can easily reproduce it on CentOS 7.x with Kernel 3.x and 4.x

This weekend I took one of my CentOS 6.9 servers and swapped the boot drive and installed Ubuntu 16.04.3 LTS which had Kernel version 4.4.0-87-generic

Version of mpt2sas under this verion of Ubuntu was: 12.100.00.00

As soon as my jbod disks go to sleep and I access them, the issue occurs still while they are waking up:

Sep 2 16:14:43 misc02 kernel: [ 713.411876] sd 9:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK Sep 2 16:14:43 misc02 kernel: [ 713.411888] sd 9:0:0:0: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 15 d5 12 80 00 00 00 08 00 00 Sep 2 16:14:43 misc02 kernel: [ 713.411894] blk_update_request: I/O error, dev sdi, sector 366285440

I feel very confident something was introduced at some point in the Kernel 3.x series that is still present in 4.x causing this issue (that also manifests itself with those of you ZFS users)

behlendorf commented 6 years ago

This was added to the 0.8 milestone so we don't loose track of it. @brianmduncan my guess is this was likely introduced in the 3.9 kernel with the CONFIG_PM_RUNTIME option.

cobrafast commented 6 years ago

Just wanted to chime in I'm affected by this problem too (and am not a ZFS user either). I made my write-up over at https://serverfault.com/q/876750/81089 since I didn't find this thread earlier.

My kernel (4.13.4-1-ARCH) definitely has both, CONFIG_PM and "[SCSI] Fix 'Device not ready' issue on mpt2sas".

I'm going to try and debug my kernel in the next few days.

cobrafast commented 6 years ago

Here's the first debug log I was able to produce: https://pastebin.com/Kg0vdCf7 Lines beginning with SCSI DEBUG: are debug prints I added myself in scsi_check_sense(), since that was where someone was suspecting the problem.

cobrafast commented 6 years ago

Just wanted to let you guys know that I think I've come across a temporary fix until I've figured this out completely.

You need to compile your own kernel, but change ./include/linux/blkdev.h:884, which should look like this:

#define BLK_MIN_SG_TIMEOUT  (7 * HZ)

This constant is the absolute minimum lifetime a SCSI command gets to do its thing and the default setting of 7 seconds is just too short for most hard drives to spin up. I've now set it to (45 * HZ) which should give even md-raid operations that trigger a staggered spin-up of several drives enough time to finish.

The downside of this is that you might get an unresponsive/laggy system in case of a broken drive that cannot complete any commands anymore.

However, since this constant was last changed in git in 2007 (when SGv3 was added for Linux 2.4 and 2.5), I doubt that this is actually the source of this problem. I'm going to keep looking some more.

crimp42 commented 6 years ago

Great find!!! If I get a chance this weekend I will restore one of my Linux boxes back to CentOS 7.x and try to compile a custom kernel with the adjustment.

I hope you can find the original source. And this can be reported to be patched in the kernels moving forward. This particular issue has been around for years, and I found it odd that no one has squashed this one. Usually people say it is a hardware issue and leave it at that..

I was starting to believe that I would need to find an alternative to my LSI SAS2008 controllers when I needed to move forward to any distro using kernels above 3.x

crimp42 commented 6 years ago

So I restored one of my servers boot drives back to a distribution using 3.10

I confirmed before installing a custom kernel that the issue was reproducible on my current kernel.

I then modified the file within the build and changed it to #define BLK_MIN_SG_TIMEOUT (45 * HZ)

Re-compiled, installed/rebooted.

I can still reproduce if I put a drive to sleep then access it. I need to see if it is occurring less now or not. I put all my drives to sleep, then access them one at a time while monitoring the log, it seems like it might be occurring less often now.

So the 45 * HZ should equal 45 seconds instead of 7? While watching the log and when I do see the error pop up while trying to access a sleeping drive, I am guessing it only takes like 5 seconds till the error is written to the log.

ct 14 09:36:33 misc02 kernel: sd 9:0:5:0: [sdm] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK Oct 14 09:36:33 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 00 00 14 70 00 00 00 08 00 00 Oct 14 09:36:33 misc02 kernel: blk_update_request: I/O error, dev sdm, sector 5232

I will play around with this more over the weekend and do some more comparisons. Also I have read that users with SAS drives do not cause this issue where as SATA drives connected to the same controller that the SAS drive is connected to will. Shouldn't the ./include/linux/blkdev.h settings also impact SAS drives?

cobrafast commented 6 years ago

Hmm, you're also getting a FAILED command result when I got a TIMEOUT_ERROR.

You can enable SCSI logging in the kernel make nconfig under Drivers -> SCSI somewhere and then do echo 0xffffff > /proc/sys/dev/scsi/logging_level to get a bit more info on what's going on.

You can also add a dump_stack(); into ./drivers/scsi/scsi_logging.c:232 (inside scsi_print_command(...) after the two safety returns) to see where the error is coming from.

Also make sure you're actually booting your self-compiled kernel.

crimp42 commented 6 years ago

Thanks for the tips, I am definitely going to try enabling those things to see if I can get any more info. And yes I was certain I was running my modified kernel. Well as certain as I could be, I am building the kernel from the RPM src, so I extract the kernel source, modify the ./include/linux/blkdev.h settings, then re-tar gzip the source tree and put it in the RPM build directory replacing the original archived kernel source tree, build the kernel, and install the RPM. (After reboot I used uname -r to verify I am using my modified kernel build) I put a specific buildid in the spec file that is then reflected in the kernel name. I also just checked the leftover build in the RPMBUILD/BUILD directory from my kernel and the blkdev.h does have 45 HZ instead of 7 HZ.

cobrafast commented 6 years ago

Another thing you can check is the BIOS setup of the card (if your card doesn't have a BIOS you probably forgot to flash it after the last firmware flash). There are at least four different timeout values that can be set for a SAS2008 chip in IT-mode and there's no definitive way to know which one affects which drive.

The point being, if the card decides a command timed out the linux SCSI driver might see that simply as FAILED and assumes the worst. (The mpt3sas driver also seems to be a bit clunky and might just not forward a timeout result properly.)

crimp42 commented 6 years ago

I turned on SCSI debugging and captured it while it happened.

https://pastebin.com/KJhk9PaZ

sdm is the drive that was asleep that I accessed, it took like 10 seconds to come up after waking and the debug indicates some type of issue. sdn is the boot drive. Can you tell anything by that? Thanks

If I do any of this on 2.x kernels, everything is fine, no failures or errors ever. Only when I move to 3.x and above kernels does this happen.

crimp42 commented 6 years ago

cobrafast, I read over your serverfault.com link and unless I am reading something wrong, it sounds like your issue you are having/had is slightly different then the issue this thread was started on and what I am seeing. And thanks for the info on the bios, I had no idea there were additional settings if you have the bios installed on the cards. I installed mptsas2.rom on both my cards in my server I am currently testing on, and adjusted the 4 IO Timeout for XX devices from their default of 10 seconds to 20 each. Unfortunately it also did not seem to make a difference.

This is my latest test with SCSI debugging turned on, with just the logs related to the drive I tested on.

My test was putting the drive (sdn) to sleep, start monitoring log, read file from drive sdn over a network share to the drive, and the stop monitoring the log as soon as the file has started reading from drive sdn. https://pastebin.com/fsRjFEsj

It looks to me like the OS first lists a failure, then properly wakes up the drive and finishes as intended.

I am not really sure what this indicates, if anything.

When you said before:

You can also add a dump_stack(); into ./drivers/scsi/scsi_logging.c:232 (inside scsi_print_command(...) after the two safety returns) to see where the error is coming from.

What are safety returns? I found the void scsi_print_command part in scsi_logging.c, do I just add the dump_stack(); on its own line right after the 2 if's that result in a return?

Like this:

    if (!cmd->cmnd)
            return;
    logbuf = scsi_log_reserve_buffer(&logbuf_len);
    if (!logbuf)
            return;
    dump_stack();
cobrafast commented 6 years ago

Your assessment seems correct sd 9:0:5:0: scsi_eh_9: Sending START_UNIT is done after the failed command. That's weird.

Yes that's the correct dump_stack position.

crimp42 commented 6 years ago

OK thanks, I recompiled my kernel with the dump_stack in place and enabled the verbose debugging again and reproduced on drive sdm.

https://pastebin.com/KqRv5rYV

It looks the same to me, that it is erroring out, saying the drive is not ready, the handler then kicks in and wakes it, then it works. I am unable to determine anything from the stack being dumped, I did see things related to xfs (given that the filesystem on this drive is XFS)

I just realized I have 2 drives on this controller with EXT4 partitions, and I cannot reproduce this on. They actually take longer to wake up from sleep then the other too. I wonder if that has anything to do with it.

Actually it probably has more to do with the model SATA drive. I found another drive that is XFS that this does not happen with. The 2 drives I have with EXT4 partitions just happen to be my 2 newest drives.

splitice commented 6 years ago

Finally, I found this issue. I've been blaming my hardware for this!

I'll too be trying the 45*HZ change and will let you know if it changes in frequency.

I too am seeing FAILED status's like @brianmduncan

[ 6685.145034] sd 12:0:85:0: [sdq] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.145044] sd 12:0:85:0: [sdq] tag#1 Sense Key : Hardware Error [current]
[ 6685.145052] sd 12:0:85:0: [sdq] tag#1 Add. Sense: Internal target failure
[ 6685.145060] sd 12:0:85:0: [sdq] tag#1 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[ 6685.145076] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.145724] sd 12:0:85:0: [sdq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.145738] sd 12:0:85:0: [sdq] tag#0 Sense Key : Hardware Error [current]
[ 6685.145746] sd 12:0:85:0: [sdq] tag#0 Add. Sense: Internal target failure
[ 6685.145753] sd 12:0:85:0: [sdq] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 6685.145757] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.145902] Buffer I/O error on dev sdq, logical block 0, async page read
[ 6685.146168] sd 12:0:85:0: [sdq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.146180] sd 12:0:85:0: [sdq] tag#0 Sense Key : Hardware Error [current]
[ 6685.146188] sd 12:0:85:0: [sdq] tag#0 Add. Sense: Internal target failure
[ 6685.146194] sd 12:0:85:0: [sdq] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 6685.146199] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.146340] Buffer I/O error on dev sdq, logical block 0, async page read
[ 6685.146582] sd 12:0:85:0: [sdq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.146588] sd 12:0:85:0: [sdq] tag#0 Sense Key : Hardware Error [current]
[ 6685.146593] sd 12:0:85:0: [sdq] tag#0 Add. Sense: Internal target failure
[ 6685.146598] sd 12:0:85:0: [sdq] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 6685.146601] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.146738] Buffer I/O error on dev sdq, logical block 0, async page read
[ 6685.146968] sd 12:0:85:0: [sdq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.146973] sd 12:0:85:0: [sdq] tag#0 Sense Key : Hardware Error [current]
[ 6685.146979] sd 12:0:85:0: [sdq] tag#0 Add. Sense: Internal target failure
[ 6685.146984] sd 12:0:85:0: [sdq] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 6685.146986] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.147124] Buffer I/O error on dev sdq, logical block 0, async page read
[ 6685.147347] sd 12:0:85:0: [sdq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.147353] sd 12:0:85:0: [sdq] tag#0 Sense Key : Hardware Error [current]
[ 6685.147358] sd 12:0:85:0: [sdq] tag#0 Add. Sense: Internal target failure
[ 6685.147364] sd 12:0:85:0: [sdq] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 6685.147366] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.147503] Buffer I/O error on dev sdq, logical block 0, async page read
[ 6685.147710] sd 12:0:85:0: [sdq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.147715] sd 12:0:85:0: [sdq] tag#0 Sense Key : Hardware Error [current]
[ 6685.147721] sd 12:0:85:0: [sdq] tag#0 Add. Sense: Internal target failure
[ 6685.147726] sd 12:0:85:0: [sdq] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 6685.147731] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.147868] Buffer I/O error on dev sdq, logical block 0, async page read
[ 6685.148086] sd 12:0:85:0: [sdq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.148093] sd 12:0:85:0: [sdq] tag#0 Sense Key : Hardware Error [current]
[ 6685.148099] sd 12:0:85:0: [sdq] tag#0 Add. Sense: Internal target failure
[ 6685.148104] sd 12:0:85:0: [sdq] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 6685.148107] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.148247] Buffer I/O error on dev sdq, logical block 0, async page read
[ 6685.148399] Dev sdq: unable to read RDB block 0
[ 6685.148528] sd 12:0:85:0: [sdq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.148532] sd 12:0:85:0: [sdq] tag#0 Sense Key : Hardware Error [current]
[ 6685.148536] sd 12:0:85:0: [sdq] tag#0 Add. Sense: Internal target failure
[ 6685.148539] sd 12:0:85:0: [sdq] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 6685.148541] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.148674] Buffer I/O error on dev sdq, logical block 0, async page read
[ 6685.148888] sd 12:0:85:0: [sdq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 6685.148892] sd 12:0:85:0: [sdq] tag#0 Sense Key : Hardware Error [current]
[ 6685.148896] sd 12:0:85:0: [sdq] tag#0 Add. Sense: Internal target failure
[ 6685.148900] sd 12:0:85:0: [sdq] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 6685.148901] blk_update_request: critical target error, dev sdq, sector 0
[ 6685.149034] Buffer I/O error on dev sdq, logical block 0, async page read
[ 6685.149261] Buffer I/O error on dev sdq, logical block 3, async page read
splitice commented 6 years ago

My research so far has revealed that this issue is supposed to be fixed in https://github.com/torvalds/linux/commit/ee6928d63e37e1b1e7eca898c5b2bac19f194ebd

My current working theory is that this has been broken since 3.5.4 when this patch was introduced.

crimp42 commented 6 years ago

Based on what I read in that link, it sounds like this issue. So does that mean the patch (if that is what it is) will make it into the kernel any time soon? If I have any time this weekend, I will attempt to compile a custom current kernel with those changes to scsi_error.c and scsi_scan.c. To see if I can still reproduce the issue.

splitice commented 6 years ago

That patch as been in the kernel since 3.5.4, that's a link to it committed.

Can you tell me what the value of: cat /sys/block/sd?/device/scsi_disk/*/allow_restart is on your LSI backed drives?

crimp42 commented 6 years ago

Oh I think I get what you are saying now, that is the patch that might have lead to this behaviour?

All my drives (LSI connected SATA) (on a pre linux 3 kernel) where this does not occur is all 0's. I swapped my other server back to the boot drive with a 4.x kernel based distribution where it is occurring. And it lists 1's for all my drives. (LSI connected SATA)

splitice commented 6 years ago

That patch is the supposed fix for the bug seen here, although it doesnt seem to be working in 4.x kernels.

Allow_restart should be 1, as it is needed to combat another patch introduced in approx 3.2 (https://github.com/torvalds/linux/commit/85ef06d1d252f6a2e73b678591ab71caad4667bb). If you are getting 1's then the patch in drivers/scsi/scsi_scan.c is doing it's job.

The other part of the patch, the code in drivers/scsi/scsi_error.c isn't something I 100% understand. It looks like it is supposed to skip an error (considered SUCCESS not failed?) if a unit not ready condition is met, and if the done handler is in use (why?).

If you have time I would love to see the results of a printk of scmd->cmnd[0], scmd->scsi_done, scsi_eh_done during the error. Perhaps something has changed there too. It should be logged just above the error in dmesg.

I'm working on getting a kernel compiling on this server, unfortunately having some unrelated technical issues.

crimp42 commented 6 years ago

Well I also confirmed the issue was present in other 3.x Kernels I tried. I don't recall all the different 3.x kernels I tried, but I know for a fact I tried 3.x kernel: (3.10.0-514.26.2.el7.x86_64) with CentOS. And all the 4.x kernels I have tried with CentOS this occurs with. But all the 2.x kernels I don't see this issue at all.

If there is anything else you want me to do specifically with custom modifications made to any of the kernel code, I will have time Sunday. Just let me know what to change and I can re-compile and test and give results, if that helps. I am not a coder so I would need exact changes that would need to be made. The kernel I am currently running on my test drive for my other server is 3.10.0-693.el7.centos.local.x86_64 it already has the 45 * HZ change, which made no difference for me.

splitice commented 6 years ago

I can't see how the 45second timeout would help here. It seems to me like this is a case of the unit not ready error code being translated into an error (where I think instead the command should just be retried). I only have a cursory understanding of scsi/sata though so I may be wrong.

I'm currently waiting on a kernel without most of https://github.com/torvalds/linux/commit/85ef06d1d252f6a2e73b678591ab71caad4667bb to compile. It might break userland eject but I don't have any removable media anyway :) Fingers crossed.

splitice commented 6 years ago

No luck with reverting that patch so far.

I have managed to reduce the frequency of errors by setting the following early in the boot:

echo 1 | tee /sys/block/sd?/device/scsi_disk/*/manage_start_stop

And ensuring the drives don't sleep by executing the following every minute:

for file in /dev/sd?; do
smartctl -a $file
done

Still far from perfect though.

splitice commented 6 years ago

Damn, I thought I was onto something as I was able to get it to work for 6 hours. No longer :(

I think this is also related to https://github.com/zfsonlinux/zfs/issues/3785 since after alot of IO, on IO error due to Unit Not Ready the further IO to this drive does freeze the system in some cases. In this case the reporter was also using mpt3sas/mpt2sas (same underlying driver). This looking like a ZFS+mpt2sas bug.

Here are some of the hung backtraces while I do a ZFS import of a volume which fell off:

[ 1089.043829] INFO: task zpool:11511 blocked for more than 120 seconds.
[ 1089.047187]       Tainted: P           O    4.8.0-58-generic #63~16.04.1-Ubuntu
[ 1089.050588] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1089.054033] zpool           D ffff92449e4e7c08     0 11511   8156 0x00000004
[ 1089.054043]  ffff92449e4e7c08 0001ffffc05fbaed ffff92450d688ec0 ffff92449ec3e740
[ 1089.054048]  ffff92449a2b6e40 ffff92449e4e8000 ffffffffc083bc44 ffff92449ec3e740
[ 1089.054057]  00000000ffffffff ffffffffc083bc48 ffff92449e4e7c20 ffffffffb88960a5
[ 1089.054062] Call Trace:
[ 1089.054074]  [<ffffffffb88960a5>] schedule+0x35/0x80
[ 1089.054079]  [<ffffffffb889634e>] schedule_preempt_disabled+0xe/0x10
[ 1089.054084]  [<ffffffffb8897fa9>] __mutex_lock_slowpath+0xb9/0x130
[ 1089.054089]  [<ffffffffb889803f>] mutex_lock+0x1f/0x30
[ 1089.054173]  [<ffffffffc06bb7e0>] spa_open_common+0x60/0x450 [zfs]
[ 1089.054241]  [<ffffffffc06bbc57>] spa_get_stats+0x57/0x510 [zfs]
[ 1089.054248]  [<ffffffffb8206a56>] ? kmalloc_large_node+0x56/0x90
[ 1089.054252]  [<ffffffffb820a5a5>] ? __kmalloc_node+0x205/0x2a0
[ 1089.054264]  [<ffffffffc05e0b83>] ? spl_kmem_zalloc+0xa3/0x180 [spl]
[ 1089.054340]  [<ffffffffc06eda79>] zfs_ioc_pool_stats+0x39/0x90 [zfs]
[ 1089.054418]  [<ffffffffc06f103b>] zfsdev_ioctl+0x44b/0x4e0 [zfs]
[ 1089.054425]  [<ffffffffb81db5ec>] ? handle_mm_fault+0xf6c/0x13b0
[ 1089.054430]  [<ffffffffb82477a1>] do_vfs_ioctl+0xa1/0x5f0
[ 1089.054434]  [<ffffffffb806b505>] ? __do_page_fault+0x265/0x4e0
[ 1089.054438]  [<ffffffffb8247d69>] SyS_ioctl+0x79/0x90
[ 1089.054441]  [<ffffffffb889a8f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[ 1089.054483] INFO: task z_zvol:13030 blocked for more than 120 seconds.
[ 1089.058020]       Tainted: P           O    4.8.0-58-generic #63~16.04.1-Ubuntu
[ 1089.061650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1089.065380] z_zvol          D ffff9244921d7c68     0 13030      2 0x00000000
[ 1089.065389]  ffff9244921d7c68 ffff9244921d7c60 ffff92450d68c9c0 ffff924492220000
[ 1089.065395]  0000000000000000 ffff9244921d8000 ffffffffc083bc44 ffff924492220000
[ 1089.065406]  00000000ffffffff ffffffffc083bc48 ffff9244921d7c80 ffffffffb88960a5
[ 1089.065414] Call Trace:
[ 1089.065425]  [<ffffffffb88960a5>] schedule+0x35/0x80
[ 1089.065431]  [<ffffffffb889634e>] schedule_preempt_disabled+0xe/0x10
[ 1089.065436]  [<ffffffffb8897fa9>] __mutex_lock_slowpath+0xb9/0x130
[ 1089.065442]  [<ffffffffb889803f>] mutex_lock+0x1f/0x30
[ 1089.065526]  [<ffffffffc06bb7e0>] spa_open_common+0x60/0x450 [zfs]
[ 1089.065532]  [<ffffffffb80baa4d>] ? dequeue_task_fair+0x56d/0x920
[ 1089.065537]  [<ffffffffb80bd0b3>] ? put_prev_entity+0x33/0x410
[ 1089.065542]  [<ffffffffb820a57f>] ? __kmalloc_node+0x1df/0x2a0
[ 1089.065620]  [<ffffffffc071e100>] ? zvol_task_cb+0x370/0x370 [zfs]
[ 1089.065687]  [<ffffffffc06bbbf3>] spa_open+0x13/0x20 [zfs]
[ 1089.065742]  [<ffffffffc067d81f>] dmu_objset_find+0x3f/0x90 [zfs]
[ 1089.065820]  [<ffffffffc071e0e7>] zvol_task_cb+0x357/0x370 [zfs]
[ 1089.065832]  [<ffffffffc05e4190>] taskq_thread+0x240/0x440 [spl]
[ 1089.065836]  [<ffffffffb8895be7>] ? __schedule+0x307/0x790
[ 1089.065840]  [<ffffffffb80af020>] ? wake_up_q+0x70/0x70
[ 1089.065850]  [<ffffffffc05e3f50>] ? taskq_cancel_id+0x140/0x140 [spl]
[ 1089.065855]  [<ffffffffb80a3fd8>] kthread+0xd8/0xf0
[ 1089.065858]  [<ffffffffb889ab1f>] ret_from_fork+0x1f/0x40
[ 1089.065862]  [<ffffffffb80a3f00>] ? kthread_create_on_node+0x1e0/0x1e0
[ 1089.065866] INFO: task txg_sync:13099 blocked for more than 120 seconds.
[ 1089.069637]       Tainted: P           O    4.8.0-58-generic #63~16.04.1-Ubuntu
[ 1089.073464] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1089.077408] txg_sync        D ffff9244936e7ab8     0 13099      2 0x00000000
[ 1089.077414]  ffff9244936e7ab8 ffff92451fc19370 ffffffffb8e0d500 ffff924492010000
[ 1089.077420]  0000000000000000 ffff9244936e8000 ffff92451fc19300 7fffffffffffffff
[ 1089.077426]  ffff9244e013ddf0 0000000000000001 ffff9244936e7ad0 ffffffffb88960a5
[ 1089.077431] Call Trace:
[ 1089.077437]  [<ffffffffb88960a5>] schedule+0x35/0x80
[ 1089.077442]  [<ffffffffb88998d5>] schedule_timeout+0x235/0x3f0
[ 1089.077447]  [<ffffffffb80b7fc9>] ? set_next_entity+0x49/0xd0
[ 1089.077452]  [<ffffffffb802c78c>] ? __switch_to+0x2dc/0x700
[ 1089.077457]  [<ffffffffb80c1748>] ? pick_next_task_fair+0x3e8/0x4f0
[ 1089.077462]  [<ffffffffb80fcadc>] ? ktime_get+0x3c/0xb0
[ 1089.077467]  [<ffffffffb8895874>] io_schedule_timeout+0xa4/0x110
[ 1089.077484]  [<ffffffffc05e7c42>] cv_wait_common+0xb2/0x130 [spl]
[ 1089.077495]  [<ffffffffb80c74a0>] ? wake_atomic_t_function+0x60/0x60
[ 1089.077515]  [<ffffffffc05e7d18>] __cv_wait_io+0x18/0x20 [spl]
[ 1089.077617]  [<ffffffffc071265d>] zio_wait+0xfd/0x1d0 [zfs]
[ 1089.077707]  [<ffffffffc069e1ed>] dsl_pool_sync+0x2dd/0x460 [zfs]
[ 1089.077788]  [<ffffffffc06b9064>] spa_sync+0x374/0xb30 [zfs]
[ 1089.077794]  [<ffffffffb80af032>] ? default_wake_function+0x12/0x20
[ 1089.077868]  [<ffffffffc06c9c86>] txg_sync_thread+0x3a6/0x610 [zfs]
[ 1089.077967]  [<ffffffffc06c98e0>] ? txg_delay+0x160/0x160 [zfs]
[ 1089.077978]  [<ffffffffc05e2ee1>] thread_generic_wrapper+0x71/0x80 [spl]
[ 1089.077988]  [<ffffffffc05e2e70>] ? __thread_exit+0x20/0x20 [spl]
[ 1089.077993]  [<ffffffffb80a3fd8>] kthread+0xd8/0xf0
[ 1089.077998]  [<ffffffffb889ab1f>] ret_from_fork+0x1f/0x40
[ 1089.078003]  [<ffffffffb80a3f00>] ? kthread_create_on_node+0x1e0/0x1e0
[ 1089.078007] INFO: task spa_async:14318 blocked for more than 120 seconds.
[ 1089.082144]       Tainted: P           O    4.8.0-58-generic #63~16.04.1-Ubuntu
[ 1089.086245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1089.090436] spa_async       D ffff92449abcbd30     0 14318      2 0x00000000
[ 1089.090444]  ffff92449abcbd30 ffffffffb80c1471 ffff92450d68c9c0 ffff924492010ec0
[ 1089.090458]  0000000300000000 ffff92449abcc000 ffff9244927aafd8 ffff9244927aafb0
[ 1089.090466]  ffff9244927aaff8 0000000000000000 ffff92449abcbd48 ffffffffb88960a5
[ 1089.090476] Call Trace:
[ 1089.090486]  [<ffffffffb80c1471>] ? pick_next_task_fair+0x111/0x4f0
[ 1089.090492]  [<ffffffffb88960a5>] schedule+0x35/0x80
[ 1089.090508]  [<ffffffffc05e7c83>] cv_wait_common+0xf3/0x130 [spl]
[ 1089.090513]  [<ffffffffb80c74a0>] ? wake_atomic_t_function+0x60/0x60
[ 1089.090529]  [<ffffffffc05e7cd5>] __cv_wait+0x15/0x20 [spl]
[ 1089.090613]  [<ffffffffc06c33dd>] spa_config_enter+0x8d/0x100 [zfs]
[ 1089.090682]  [<ffffffffc06c117b>] spa_config_update+0x3b/0x170 [zfs]
[ 1089.090750]  [<ffffffffc06be6fd>] spa_async_thread+0x21d/0x2a0 [zfs]
[ 1089.090818]  [<ffffffffc06be4e0>] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 1089.090830]  [<ffffffffc05e2ee1>] thread_generic_wrapper+0x71/0x80 [spl]
[ 1089.090839]  [<ffffffffc05e2e70>] ? __thread_exit+0x20/0x20 [spl]
[ 1089.090844]  [<ffffffffb80a3fd8>] kthread+0xd8/0xf0
[ 1089.090849]  [<ffffffffb889ab1f>] ret_from_fork+0x1f/0x40
[ 1089.090853]  [<ffffffffb80a3f00>] ? kthread_create_on_node+0x1e0/0x1e0
[ 1209.882912] INFO: task zpool:11511 blocked for more than 120 seconds.
[ 1209.887182]       Tainted: P           O    4.8.0-58-generic #63~16.04.1-Ubuntu
[ 1209.891497] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1209.895918] zpool           D ffff92449e4e7c08     0 11511   8156 0x00000004
[ 1209.895924]  ffff92449e4e7c08 0001ffffc05fbaed ffff92450d688ec0 ffff92449ec3e740
[ 1209.895934]  ffff92449a2b6e40 ffff92449e4e8000 ffffffffc083bc44 ffff92449ec3e740
[ 1209.895936]  00000000ffffffff ffffffffc083bc48 ffff92449e4e7c20 ffffffffb88960a5
[ 1209.895939] Call Trace:
[ 1209.895950]  [<ffffffffb88960a5>] schedule+0x35/0x80
[ 1209.895953]  [<ffffffffb889634e>] schedule_preempt_disabled+0xe/0x10
[ 1209.895956]  [<ffffffffb8897fa9>] __mutex_lock_slowpath+0xb9/0x130
[ 1209.895959]  [<ffffffffb889803f>] mutex_lock+0x1f/0x30
[ 1209.896020]  [<ffffffffc06bb7e0>] spa_open_common+0x60/0x450 [zfs]
[ 1209.896073]  [<ffffffffc06bbc57>] spa_get_stats+0x57/0x510 [zfs]
[ 1209.896088]  [<ffffffffb8206a56>] ? kmalloc_large_node+0x56/0x90
[ 1209.896096]  [<ffffffffb820a5a5>] ? __kmalloc_node+0x205/0x2a0
[ 1209.896107]  [<ffffffffc05e0b83>] ? spl_kmem_zalloc+0xa3/0x180 [spl]
[ 1209.896152]  [<ffffffffc06eda79>] zfs_ioc_pool_stats+0x39/0x90 [zfs]
[ 1209.896207]  [<ffffffffc06f103b>] zfsdev_ioctl+0x44b/0x4e0 [zfs]
[ 1209.896211]  [<ffffffffb81db5ec>] ? handle_mm_fault+0xf6c/0x13b0
[ 1209.896215]  [<ffffffffb82477a1>] do_vfs_ioctl+0xa1/0x5f0
[ 1209.896218]  [<ffffffffb806b505>] ? __do_page_fault+0x265/0x4e0
[ 1209.896220]  [<ffffffffb8247d69>] SyS_ioctl+0x79/0x90
[ 1209.896223]  [<ffffffffb889a8f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[ 1209.896259] INFO: task z_zvol:13030 blocked for more than 120 seconds.
[ 1209.900793]       Tainted: P           O    4.8.0-58-generic #63~16.04.1-Ubuntu
[ 1209.905452] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1209.910201] z_zvol          D ffff9244921d7c68     0 13030      2 0x00000000
[ 1209.910207]  ffff9244921d7c68 ffff9244921d7c60 ffff92450d68c9c0 ffff924492220000
[ 1209.910217]  0000000000000000 ffff9244921d8000 ffffffffc083bc44 ffff924492220000
[ 1209.910235]  00000000ffffffff ffffffffc083bc48 ffff9244921d7c80 ffffffffb88960a5
[ 1209.910238] Call Trace:
[ 1209.910245]  [<ffffffffb88960a5>] schedule+0x35/0x80
[ 1209.910248]  [<ffffffffb889634e>] schedule_preempt_disabled+0xe/0x10
[ 1209.910251]  [<ffffffffb8897fa9>] __mutex_lock_slowpath+0xb9/0x130
[ 1209.910254]  [<ffffffffb889803f>] mutex_lock+0x1f/0x30
[ 1209.910318]  [<ffffffffc06bb7e0>] spa_open_common+0x60/0x450 [zfs]
[ 1209.910321]  [<ffffffffb80baa4d>] ? dequeue_task_fair+0x56d/0x920
[ 1209.910324]  [<ffffffffb80bd0b3>] ? put_prev_entity+0x33/0x410
[ 1209.910327]  [<ffffffffb820a57f>] ? __kmalloc_node+0x1df/0x2a0
[ 1209.910382]  [<ffffffffc071e100>] ? zvol_task_cb+0x370/0x370 [zfs]
[ 1209.910433]  [<ffffffffc06bbbf3>] spa_open+0x13/0x20 [zfs]
[ 1209.910471]  [<ffffffffc067d81f>] dmu_objset_find+0x3f/0x90 [zfs]
[ 1209.910525]  [<ffffffffc071e0e7>] zvol_task_cb+0x357/0x370 [zfs]
[ 1209.910534]  [<ffffffffc05e4190>] taskq_thread+0x240/0x440 [spl]
[ 1209.910537]  [<ffffffffb8895be7>] ? __schedule+0x307/0x790
[ 1209.910540]  [<ffffffffb80af020>] ? wake_up_q+0x70/0x70
[ 1209.910547]  [<ffffffffc05e3f50>] ? taskq_cancel_id+0x140/0x140 [spl]
[ 1209.910550]  [<ffffffffb80a3fd8>] kthread+0xd8/0xf0
[ 1209.910553]  [<ffffffffb889ab1f>] ret_from_fork+0x1f/0x40
[ 1209.910555]  [<ffffffffb80a3f00>] ? kthread_create_on_node+0x1e0/0x1e0
[ 1209.910559] INFO: task txg_sync:13099 blocked for more than 120 seconds.
[ 1209.915398]       Tainted: P           O    4.8.0-58-generic #63~16.04.1-Ubuntu
[ 1209.920212] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1209.925116] txg_sync        D ffff9244936e7ab8     0 13099      2 0x00000000
[ 1209.925129]  ffff9244936e7ab8 ffff92451fc19370 ffffffffb8e0d500 ffff924492010000
[ 1209.925132]  0000000000000000 ffff9244936e8000 ffff92451fc19300 7fffffffffffffff
[ 1209.925135]  ffff9244e013ddf0 0000000000000001 ffff9244936e7ad0 ffffffffb88960a5
[ 1209.925137] Call Trace:
[ 1209.925141]  [<ffffffffb88960a5>] schedule+0x35/0x80
[ 1209.925144]  [<ffffffffb88998d5>] schedule_timeout+0x235/0x3f0
[ 1209.925147]  [<ffffffffb80b7fc9>] ? set_next_entity+0x49/0xd0
[ 1209.925150]  [<ffffffffb802c78c>] ? __switch_to+0x2dc/0x700
[ 1209.925152]  [<ffffffffb80c1748>] ? pick_next_task_fair+0x3e8/0x4f0
[ 1209.925155]  [<ffffffffb80fcadc>] ? ktime_get+0x3c/0xb0
[ 1209.925157]  [<ffffffffb8895874>] io_schedule_timeout+0xa4/0x110
[ 1209.925166]  [<ffffffffc05e7c42>] cv_wait_common+0xb2/0x130 [spl]
[ 1209.925176]  [<ffffffffb80c74a0>] ? wake_atomic_t_function+0x60/0x60
[ 1209.925187]  [<ffffffffc05e7d18>] __cv_wait_io+0x18/0x20 [spl]
[ 1209.925248]  [<ffffffffc071265d>] zio_wait+0xfd/0x1d0 [zfs]
[ 1209.925295]  [<ffffffffc069e1ed>] dsl_pool_sync+0x2dd/0x460 [zfs]
[ 1209.925339]  [<ffffffffc06b9064>] spa_sync+0x374/0xb30 [zfs]
[ 1209.925346]  [<ffffffffb80af032>] ? default_wake_function+0x12/0x20
[ 1209.925390]  [<ffffffffc06c9c86>] txg_sync_thread+0x3a6/0x610 [zfs]
[ 1209.925442]  [<ffffffffc06c98e0>] ? txg_delay+0x160/0x160 [zfs]
[ 1209.925449]  [<ffffffffc05e2ee1>] thread_generic_wrapper+0x71/0x80 [spl]
[ 1209.925454]  [<ffffffffc05e2e70>] ? __thread_exit+0x20/0x20 [spl]
[ 1209.925460]  [<ffffffffb80a3fd8>] kthread+0xd8/0xf0
[ 1209.925462]  [<ffffffffb889ab1f>] ret_from_fork+0x1f/0x40
[ 1209.925464]  [<ffffffffb80a3f00>] ? kthread_create_on_node+0x1e0/0x1e0
[ 1209.925466] INFO: task spa_async:14318 blocked for more than 120 seconds.
[ 1209.930291]       Tainted: P           O    4.8.0-58-generic #63~16.04.1-Ubuntu
[ 1209.935140] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1209.940065] spa_async       D ffff92449abcbd30     0 14318      2 0x00000000
[ 1209.940070]  ffff92449abcbd30 ffffffffb80c1471 ffff92450d68c9c0 ffff924492010ec0
[ 1209.940074]  0000000300000000 ffff92449abcc000 ffff9244927aafd8 ffff9244927aafb0
[ 1209.940078]  ffff9244927aaff8 0000000000000000 ffff92449abcbd48 ffffffffb88960a5
[ 1209.940100] Call Trace:
[ 1209.940111]  [<ffffffffb80c1471>] ? pick_next_task_fair+0x111/0x4f0
[ 1209.940121]  [<ffffffffb88960a5>] schedule+0x35/0x80
[ 1209.940136]  [<ffffffffc05e7c83>] cv_wait_common+0xf3/0x130 [spl]
[ 1209.940139]  [<ffffffffb80c74a0>] ? wake_atomic_t_function+0x60/0x60
[ 1209.940147]  [<ffffffffc05e7cd5>] __cv_wait+0x15/0x20 [spl]
[ 1209.940202]  [<ffffffffc06c33dd>] spa_config_enter+0x8d/0x100 [zfs]
[ 1209.940249]  [<ffffffffc06c117b>] spa_config_update+0x3b/0x170 [zfs]
[ 1209.940295]  [<ffffffffc06be6fd>] spa_async_thread+0x21d/0x2a0 [zfs]
[ 1209.940343]  [<ffffffffc06be4e0>] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 1209.940350]  [<ffffffffc05e2ee1>] thread_generic_wrapper+0x71/0x80 [spl]
[ 1209.940356]  [<ffffffffc05e2e70>] ? __thread_exit+0x20/0x20 [spl]
[ 1209.940359]  [<ffffffffb80a3fd8>] kthread+0xd8/0xf0
[ 1209.940361]  [<ffffffffb889ab1f>] ret_from_fork+0x1f/0x40
[ 1209.940364]  [<ffffffffb80a3f00>] ? kthread_create_on_node+0x1e0/0x1e0
[ 1209.940366] INFO: task zpool:16255 blocked for more than 120 seconds.
[ 1209.945346]       Tainted: P           O    4.8.0-58-generic #63~16.04.1-Ubuntu
[ 1209.950388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1209.955492] zpool           D ffff9244a1d0fd08     0 16255  16202 0x00000000
[ 1209.955498]  ffff9244a1d0fd08 0000000000000001 ffff92450d5b6740 ffff92448e556740
[ 1209.955502]  0000000000000018 ffff9244a1d10000 ffffffffc083bc44 ffff92448e556740
[ 1209.955505]  00000000ffffffff ffffffffc083bc48 ffff9244a1d0fd20 ffffffffb88960a5
[ 1209.955508] Call Trace:
[ 1209.955517]  [<ffffffffb88960a5>] schedule+0x35/0x80
[ 1209.955527]  [<ffffffffb889634e>] schedule_preempt_disabled+0xe/0x10
[ 1209.955540]  [<ffffffffb8897fa9>] __mutex_lock_slowpath+0xb9/0x130
[ 1209.955553]  [<ffffffffb889803f>] mutex_lock+0x1f/0x30
[ 1209.955629]  [<ffffffffc06c01cc>] spa_all_configs+0x5c/0x170 [zfs]
[ 1209.955694]  [<ffffffffc06edc7b>] zfs_ioc_pool_configs+0x1b/0x50 [zfs]
[ 1209.955750]  [<ffffffffc06f103b>] zfsdev_ioctl+0x44b/0x4e0 [zfs]
[ 1209.955762]  [<ffffffffb81db5ec>] ? handle_mm_fault+0xf6c/0x13b0
[ 1209.955766]  [<ffffffffb82477a1>] do_vfs_ioctl+0xa1/0x5f0
[ 1209.955774]  [<ffffffffb806b505>] ? __do_page_fault+0x265/0x4e0
[ 1209.955780]  [<ffffffffb8247d69>] SyS_ioctl+0x79/0x90
[ 1209.955783]  [<ffffffffb889a8f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
behlendorf commented 6 years ago

@splitice the above stacks are what I'd expect to see if an IO completion wasn't reported to ZFS. The txg_sync thread is waiting in zio_wait() for a read to complete and everything is piled up behind that. ZFS expects every IO to either succeed, fail, or timeout if none of those things happen you'll see this. If you can reproduce this issue it would be great if you could dig in to it and determine exactly what happens.

splitice commented 6 years ago

@behlendorf I'm fairly certain I can replicate it. I've been copying data over samba and I get this every couple of hours.

It appears to happen after a large number of unit not ready errors occur.

I just replaced some segate Desktop drives with WD reds as they sleep less and am gathering some data.

crimp42 commented 6 years ago

Hey, has anyone that has this issue tried a LSI SAS3008 based controller to see if the issue goes away? I have decided to give up and move on. I need to move to the 4.x kernel for 10 Gigabit support for a specific ethernet adapter.. Was considering just getting a couple of the LSI SAS 9300-8i's because they are based on the LSI SAS3008 chipset. But if they use the same drivers in Linux as the LSI SAS2008, wondering if this same issue would be present. I can't seem to find anyone else complaining about this issue with SAS3008 controllers. Thanks.

splitice commented 6 years ago

@brianmduncan I don't have any SAS3008 chip devices so no idea :(

On Wed, Nov 15, 2017 at 11:22 AM, brianmduncan notifications@github.com wrote:

Hey, has anyone that has this issue tried a LSI SAS3008 based controller to see if the issue goes away? I have decided to give up and move on. I need to move to the 4.x kernel for 10 Gigabit support for a specific ethernet adapter.. Was considering just getting a couple of the LSI SAS 9300-8i's because they are based on the LSI SAS3008 chipset. But if they use the same drivers in Linux as the LSI SAS3008, wondering if this same issue would be present. I can't seem to find anyone else complaining about this issue with SAS3008 controllers. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/4713#issuecomment-344445071, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcmY6tsVXkWS_YmGqdW3iEtIZ-GUFpxks5s2i6tgaJpZM4IpYmi .

hoppel118 commented 6 years ago

@brianmduncan I have this issue with lsi sas3008:

root@omv:~# lspci | grep 3008 01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)

At the moment I use the following kernel:

root@omv:~# uname -a Linux omv 4.4.59-1-pve #1 SMP PVE 4.4.59-87 (Tue, 25 Apr 2017 09:01:58 +0200) x86_64 GNU/Linux

The chip is installed at my supermicro x11ssh-ctf mainboard: https://www.supermicro.com/products/motherboard/Xeon/C236_C232/X11SSH-CTF.cfm

Greetings Hoppel

crimp42 commented 6 years ago

Thanks for the confirmation. I had already taken my chances a couple days ago and ordered an LSI 9300-8I. I will also confirm I guess now.

I was hopeful.

On Sat, Nov 18, 2017 at 7:06 AM, hoppel118 notifications@github.com wrote:

@brianmduncan https://github.com/brianmduncan I have this issue with lsi sas3008:

root@omv:~# lspci | grep 3008 01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)

At the moment I use the following kernel:

root@omv:~# uname -a Linux omv 4.4.59-1-pve #1 SMP PVE 4.4.59-87 (Tue, 25 Apr 2017 09:01:58 +0200) x86_64 GNU/Linux

Greetings Hoppel

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/4713#issuecomment-345441122, or mute the thread https://github.com/notifications/unsubscribe-auth/AEegdgbt7zOELg93Oe0NUbP3cHrfzDXgks5s3tZggaJpZM4IpYmi .

hoppel118 commented 6 years ago

Yeah, the hope always dies last. ;)

Greetings Hoppel

crimp42 commented 6 years ago

Well at least in my case, I am happy to say that this LSI SAS 9300-8i controller looks to not have the same issue I was having being logged. I flashed the 9300-8i to the most currently available IT firmware (v15) and updated the bios before moving over to it. Now I can put all 8 SATA drives asleep that are connected to it and do something on a CIFS share mapped to any of the sleeping drives, and on network access, the drives wake up and then complete their transactions without any of the previous errors I have listed.

I will list the previous errors I can reproduce almost every time on certain drives when using my LSI SAS 2008 based cards. (Running on any kernel higher than 3. - Not reproducible on Kernel 2.x)

Oct 14 09:36:33 misc02 kernel: sd 9:0:5:0: [sdm] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK Oct 14 09:36:33 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 00 00 14 70 00 00 00 08 00 00 Oct 14 09:36:33 misc02 kernel: blk_update_request: I/O error, dev sdm, sector 5232

I am on CentOS 7.4.1708 with 4.4.98-1.el7.elrepo.x86_64 kernel.

I know that these logged errors are just cosmetic, and nothing is failing. It is just nice now, not to see these stupid errors in the logs. I am only going to replace the cards on my main server. I will live with them on my backup server.