openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.47k stars 1.74k forks source link

Inconsistent handling of spun down drives between the SCSI subsystem and ZFS #7859

Closed splitice closed 3 years ago

splitice commented 6 years ago

On accessing a spun down drive ZFS removes the target due to IO errors.

The error reads along the lines of:

ZFS has detected an io error:

  eid: 15
class: io
 host: clava
 time: 2016-05-29 23:21:18+0200
vtype: disk
vpath: /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5736249-part1
vguid: 0x0094F35F53B1888B
cksum: 0
 read: 0
write: 0
 pool: green pool

and in dmesg:

[ 3647.748383] sd 1:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748386] sd 1:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 64 9d ac c8 00 00 08 00
[ 3647.748388] blk_update_request: I/O error, dev sdd, sector 1688054984
[ 3647.748401] sd 1:0:1:0: [sde] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748402] sd 1:0:1:0: [sde] tag#1 CDB: Read(10) 28 00 b4 26 aa 70 00 00 08 00
[ 3647.748403] blk_update_request: I/O error, dev sde, sector 3022432880
[ 3647.748408] sd 1:0:3:0: [sdg] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748409] sd 1:0:3:0: [sdg] tag#2 CDB: Read(10) 28 00 b4 26 ca 78 00 00 08 00
[ 3647.748410] blk_update_request: I/O error, dev sdg, sector 3022441080
[ 3655.074695] sd 1:0:2:0: [sdf] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3655.074699] sd 1:0:2:0: [sdf] tag#8 CDB: Read(10) 28 00 64 9d b8 c0 00 00 08 00
[ 3655.074700] blk_update_request: I/O error, dev sdf, sector 1688058048
[ 3655.074712] sd 1:0:2:0: [sdf] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3655.074713] sd 1:0:2:0: [sdf] tag#10 Sense Key : Not Ready [current] 
[ 3655.074715] sd 1:0:2:0: [sdf] tag#10 Add. Sense: Logical unit not ready, initializing command required
[ 3655.074716] sd 1:0:2:0: [sdf] tag#10 CDB: Read(10) 28 00 64 9d 80 e8 00 00 08 00
[ 3655.074717] blk_update_request: I/O error, dev sdf, sector 1688043752
[ 3655.074721] sd 1:0:2:0: [sdf] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3655.074722] sd 1:0:2:0: [sdf] tag#13 Sense Key : Not Ready [current] 
[ 3655.074723] sd 1:0:2:0: [sdf] tag#13 Add. Sense: Logical unit not ready, initializing command required
[ 3655.074724] sd 1:0:2:0: [sdf] tag#13 CDB: Read(10) 28 00 64 9d 90 60 00 00 08 00
[ 3655.074725] blk_update_request: I/O error, dev sdf, sector 1688047712

This occurs only in 3.x+ kernels and was not the behavior in 2.6

Additional information in the original bug report locked by @kpande: https://github.com/zfsonlinux/zfs/issues/4713

tomassrnka commented 5 years ago

This happens to SSD only pool under load as well:

[39491.564886] sd 12:0:0:0: attempting task abort! scmd(00000000e45ed1f1)
[39491.564891] sd 12:0:0:0: [sdam] tag#32 CDB: Write(10) 2a 00 2f de 76 90 00 00 08 00
[39491.564895] scsi target12:0:0: handle(0x000a), sas_address(0x500304801ebc7100), phy(0)
[39491.564898] scsi target12:0:0: enclosure logical id(0x500304801ebc713f), slot(0)
[39491.564902] scsi target12:0:0: enclosure level(0x0000), connector name(     )
[39491.564948] sd 12:0:0:0: task abort: SUCCESS scmd(00000000e45ed1f1)
[39491.564954] zio pool=ssd_backups vdev=/dev/sdam1 error=5 type=2 offset=411190501376 size=4096 flags=180880
[39491.564960] sd 12:0:0:0: attempting task abort! scmd(00000000b92663b6)
[39491.564964] sd 12:0:0:0: [sdam] tag#9 CDB: Read(10) 28 00 af 60 9f 00 00 00 08 00
[39491.564968] scsi target12:0:0: handle(0x000a), sas_address(0x500304801ebc7100), phy(0)
[39491.564972] scsi target12:0:0: enclosure logical id(0x500304801ebc713f), slot(0)
[39491.564975] scsi target12:0:0: enclosure level(0x0000), connector name(     )
[39491.565034] sd 12:0:0:0: task abort: SUCCESS scmd(00000000b92663b6)
[39491.565039] zio pool=ssd_backups vdev=/dev/sdam1 error=5 type=1 offset=1506479570944 size=4096 flags=1808a0
[39491.565047] sd 12:0:0:0: attempting task abort! scmd(0000000033e2c60f)
[39491.565051] sd 12:0:0:0: [sdam] tag#4 CDB: Read(10) 28 00 af 5f a8 e0 00 00 08 00
[39491.565053] scsi target12:0:0: handle(0x000a), sas_address(0x500304801ebc7100), phy(0)
[39491.565056] scsi target12:0:0: enclosure logical id(0x500304801ebc713f), slot(0)
[39491.565058] scsi target12:0:0: enclosure level(0x0000), connector name(     )
[39491.565104] sd 12:0:0:0: task abort: SUCCESS scmd(0000000033e2c60f)
[39491.565109] zio pool=ssd_backups vdev=/dev/sdam1 error=5 type=1 offset=1506447310848 size=4096 flags=1808a0
[39492.307902] sd 12:0:0:0: Power-on or device reset occurred
Jun 22 2019 09:53:53.112128256 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x3fb42eca6fe00c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xc4bb9ed60aaac68e
                vdev = 0xaa970e2b648c2a50
        (end detector)
        pool = "ssd_backups"
        pool_guid = 0xc4bb9ed60aaac68e
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0xaa970e2b648c2a50
        vdev_type = "disk"
        vdev_path = "/dev/sdao1"
        vdev_ashift = 0xc
        vdev_complete_ts = 0x23fb42ebef30
        vdev_delta_ts = 0x124226
        vdev_read_errors = 0x4b
        vdev_write_errors = 0x30
        vdev_cksum_errors = 0x0
        vdev_delays = 0x1c
        parent_guid = 0x77ffb2fa0df67985
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x5
        zio_flags = 0x3808b0
        zio_stage = 0x1000000
        zio_pipeline = 0x1700000
        zio_delay = 0x0
        zio_timestamp = 0x23f39ecc474c
        zio_delta = 0x0
        zio_offset = 0x199264a0000
        zio_size = 0x3000
        zio_objset = 0xbed9
        zio_object = 0x1117a
        zio_level = 0x0
        zio_blkid = 0x0
        time = 0x5d0dde91 0x6aef100
        eid = 0x41da

Jun 22 2019 09:53:53.112128256 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x3fb42f02c9000c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xc4bb9ed60aaac68e
                vdev = 0xaa970e2b648c2a50
        (end detector)
        pool = "ssd_backups"
        pool_guid = 0xc4bb9ed60aaac68e
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0xaa970e2b648c2a50
        vdev_type = "disk"
        vdev_path = "/dev/sdao1"
        vdev_ashift = 0xc
        vdev_complete_ts = 0x23fb42ef0f7d
        vdev_delta_ts = 0x11d552
        vdev_read_errors = 0x4c
        vdev_write_errors = 0x30
        vdev_cksum_errors = 0x0
        vdev_delays = 0x1c
        parent_guid = 0x77ffb2fa0df67985
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x5
        zio_flags = 0x3808b0
        zio_stage = 0x1000000
        zio_pipeline = 0x1700000
        zio_delay = 0x0
        zio_timestamp = 0x23f39eb59f65
        zio_delta = 0x0
        zio_offset = 0x19926476000
        zio_size = 0x3000
        zio_objset = 0xbed9
        zio_object = 0x11173
        zio_level = 0x0
        zio_blkid = 0x1
        time = 0x5d0dde91 0x6aef100
        eid = 0x41db

System: 4.15.0-52-generic (ubuntu 18.04 latest) & ZFS 0.8.1-0york1~18.04

The drives are 12x SATA S3510 1.92TB SSD (SSDSC2KB019T8) in JBOD connected over to SAS3 LSI card.

gorgone commented 4 years ago

zfs on Proxmox 5.x kernel same issue

[954022.380805] sd 2:0:0:0: [sdd] tag#1103 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [954022.380811] sd 2:0:0:0: [sdd] tag#1103 CDB: Read(16) 88 00 00 00 00 02 c7 46 e4 88 00 00 00 08 00 00 [954022.380813] print_req_error: I/O error, dev sdd, sector 11933246600 flags 700 [954022.380844] zio pool=storage vdev=/dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6G9GNRY-part1 error=5 type=1 offset=6109821210624 size=4096 flags=180880 [954032.733518] sd 2:0:1:0: [sde] tag#1111 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [954032.733523] sd 2:0:1:0: [sde] tag#1111 CDB: Read(16) 88 00 00 00 00 02 c7 46 e4 88 00 00 00 08 00 00 [954032.733525] print_req_error: I/O error, dev sde, sector 11933246600 flags 700 [954032.733556] zio pool=storage vdev=/dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6G9BNYY-part1 error=5 type=1 offset=6109821210624 size=4096 flags=180880 [954044.744193] sd 2:0:2:0: [sdf] tag#1101 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [954044.744198] sd 2:0:2:0: [sdf] tag#1101 CDB: Read(16) 88 00 00 00 00 03 20 00 40 30 00 00 00 08 00 00 [954044.744200] print_req_error: I/O error, dev sdf, sector 13421789232 flags 700 [954044.744231] zio pool=storage vdev=/dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6G988ZY-part1 error=5 type=1 offset=6871955038208 size=4096 flags=180980 [968529.496009] usb 1-4.3: USB disconnect, device number 5 [1137478.025979] sd 2:0:0:0: [sdd] tag#1105 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [1137478.025985] sd 2:0:0:0: [sdd] tag#1105 CDB: Read(16) 88 00 00 00 00 03 0b cb ce 88 00 00 00 08 00 00 [1137478.025987] print_req_error: I/O error, dev sdd, sector 13082807944 flags 700 [1137478.026018] zio pool=storage vdev=/dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6G9GNRY-part1 error=5 type=1 offset=6698396618752 size=4096 flags=180880 [1137488.120388] sd 2:0:1:0: [sde] tag#1209 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [1137488.120394] sd 2:0:1:0: [sde] tag#1209 CDB: Read(16) 88 00 00 00 00 03 0b cb ce 88 00 00 00 08 00 00 [1137488.120396] print_req_error: I/O error, dev sde, sector 13082807944 flags 700 [1137488.120428] zio pool=storage vdev=/dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6G9BNYY-part1 error=5 type=1 offset=6698396618752 size=4096 flags=180880 [1137498.372280] sd 2:0:2:0: [sdf] tag#1144 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [1137498.372286] sd 2:0:2:0: [sdf] tag#1144 CDB: Read(16) 88 00 00 00 00 02 f6 7c 96 20 00 00 00 08 00 00 [1137498.372288] print_req_error: I/O error, dev sdf, sector 12725294624 flags 700 [1137498.372319] zio pool=storage vdev=/dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6G988ZY-part1 error=5 type=1 offset=6515349798912 size=4096 flags=180880

stale[bot] commented 3 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.