openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.51k stars 1.74k forks source link

Drive is marked as faulted as soon as the resilver starts #8332

Closed onigoetz closed 5 years ago

onigoetz commented 5 years ago

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04.1 LTS
Linux Kernel 4.15.0-43-generic
Architecture x86_64
ZFS Version 0.7.5-1ubuntu16.4
SPL Version 0.7.5-1ubuntu1

Describe the problem you're observing

I want to replace an existing drive by a bigger drive, when starting the resilver, I directly get a bunch of print_req_errors in the logs and the drive is marked as FAULTED with "too many errors"

The command I used is : sudo zpool replace -o ashift=12 data_pool sdf /dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C

The drive is a Western Digital 4TB, bought this month, smartctl reports no errors, could write zeroes on the whole drive without errors ( dd if=/dev/zero of=/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C bs=1M )

I don't know if it's important, but I launched another resilver at the same time on another vded in the same pool, could this have an impact ?

I got the same problem last week with the same drive. But after that I tried to see if there were other errors on that disk, but nothing showed up.

Describe how to reproduce the problem

  1. Offline the old drive (I have as many sata ports as I have disks)
  2. Remove disk
  3. Insert new disk
  4. Create a GPT partition table on new disk sudo gdisk /dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C o w
  5. Replace disk sudo zpool replace -o ashift=12 data_pool sdf /dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C

Include any warning/errors/backtraces from the system logs

zpool status ``` pool: data_pool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Jan 23 19:39:56 2019 737G scanned out of 28.4T at 223M/s, 36h5m to go 85.8M resilvered, 2.54% done config: NAME STATE READ WRITE CKSUM data_pool DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 sdc ONLINE 0 0 0 replacing-1 UNAVAIL 0 0 0 insufficient replicas sdf OFFLINE 0 0 0 scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C FAULTED 0 0 0 too many errors (resilvering) sdg ONLINE 0 0 0 sdf ONLINE 0 0 0 sde ONLINE 1 1 0 ata-ST8000VN0022-2EL112_ZA1DFCNM ONLINE 0 0 0 sdh ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 sdd ONLINE 0 0 0 sda ONLINE 0 0 0 ata-WDC_WD8003FFBX-68B9AN0_VAGKH8UL ONLINE 0 0 0 ata-WDC_WD80EFZX-68UW8N0_R6GXV70Y ONLINE 0 0 0 ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T0727850 ONLINE 0 0 0 ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T0723040 ONLINE 0 0 0 replacing-6 DEGRADED 0 0 0 ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T0757951 OFFLINE 0 0 0 scsi-SATA_ST8000VN0022-2EL_ZA1DFRP3 ONLINE 0 0 0 (resilvering) errors: No known data errors ```
smartctl -H ``` smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-43-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ```
smartctl -a (details) ``` smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-43-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD40EFRX-68N32N0 Serial Number: WD-WCC7K5XYV00C LU WWN Device Id: 5 0014ee 21048f052 Firmware Version: 82.00A82 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Jan 23 20:30:49 2019 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x02) Offline data collection activity was completed without error. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (43080) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 457) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 1 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 168 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 10 194 Temperature_Celsius 0x0022 104 095 000 Old_age Always - 46 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 48 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ```
zpool events -v ``` Jan 23 2019 19:38:53.077309014 sysevent.fs.zfs.vdev_attach version = 0x0 class = "sysevent.fs.zfs.vdev_attach" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 vdev_guid = 0xbb731cc15dc66dac vdev_state = "ONLINE" (0x7) vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" time = 0x5c48b4bd 0x49ba456 eid = 0x5e Jan 23 2019 19:38:54.725279312 sysevent.fs.zfs.resilver_start version = 0x0 class = "sysevent.fs.zfs.resilver_start" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 time = 0x5c48b4be 0x2b3ae250 eid = 0x5f Jan 23 2019 19:38:54.725279312 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 history_hostname = "the-server" history_internal_str = "func=2 mintxg=3 maxtxg=13505190" history_internal_name = "scan setup" history_txg = 0xce12a6 history_time = 0x5c48b4be time = 0x5c48b4be 0x2b3ae250 eid = 0x60 Jan 23 2019 19:39:00.329178307 sysevent.fs.zfs.config_sync version = 0x0 class = "sysevent.fs.zfs.config_sync" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 time = 0x5c48b4c4 0x139edcc3 eid = 0x61 Jan 23 2019 19:39:08.397032887 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 history_hostname = "the-server" history_internal_str = "replace vdev=/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1 for vdev=/dev/sdf1" history_internal_name = "vdev attach" history_txg = 0xce12a8 history_time = 0x5c48b4cc time = 0x5c48b4cc 0x17aa3db7 eid = 0x62 Jan 23 2019 19:39:17.388870804 sysevent.fs.zfs.vdev_attach version = 0x0 class = "sysevent.fs.zfs.vdev_attach" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 vdev_guid = 0xfa187b5ab55cf0 vdev_state = "ONLINE" (0x7) vdev_path = "/dev/disk/by-id/scsi-SATA_ST8000VN0022-2EL_ZA1DFRP3-part1" time = 0x5c48b4d5 0x172db294 eid = 0x63 Jan 23 2019 19:39:34.372564639 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 history_hostname = "the-server" history_internal_str = "errors=0" history_internal_name = "scan aborted, restarting" history_txg = 0xce12ac history_time = 0x5c48b4e6 time = 0x5c48b4e6 0x1634e29f eid = 0x64 Jan 23 2019 19:39:34.388564351 sysevent.fs.zfs.resilver_start version = 0x0 class = "sysevent.fs.zfs.resilver_start" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 time = 0x5c48b4e6 0x1729057f eid = 0x65 Jan 23 2019 19:39:34.388564351 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 history_hostname = "the-server" history_internal_str = "func=2 mintxg=3 maxtxg=13505196" history_internal_name = "scan setup" history_txg = 0xce12ac history_time = 0x5c48b4e6 time = 0x5c48b4e6 0x1729057f eid = 0x66 Jan 23 2019 19:39:39.472472697 sysevent.fs.zfs.config_sync version = 0x0 class = "sysevent.fs.zfs.config_sync" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 time = 0x5c48b4eb 0x1c295c79 eid = 0x67 Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df2664b31f00001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df2664ad22 vdev_delta_ts = 0x4f9595d1 vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x1808aa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4c5a356f zio_timestamp = 0x223debf164157 zio_delta = 0x63554178 zio_offset = 0x13b8a5b3000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x84 zio_level = 0x0 zio_blkid = 0x0 time = 0x5c48b4f2 0x427378 eid = 0x68 Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df266538bf00801 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df2664ad22 vdev_delta_ts = 0x4f9595d1 vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x1808aa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4d837bb9 zio_timestamp = 0x223ded1e3670d zio_delta = 0x507a5258 zio_offset = 0xe32806d000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x86 zio_level = 0x0 zio_blkid = 0x8 time = 0x5c48b4f2 0x427378 eid = 0x69 Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df2665b53d00c01 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df26659aa4 vdev_delta_ts = 0x4a5a305e vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x1808aa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4cf65f8a zio_timestamp = 0x223dec6b85a69 zio_delta = 0x5b9821bf zio_offset = 0x131cffca000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x7e zio_level = 0x0 zio_blkid = 0x19 time = 0x5c48b4f2 0x427378 eid = 0x6a Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df266626cd00001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df26659aa4 vdev_delta_ts = 0x4a5a305e vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x1808aa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4faed555 zio_timestamp = 0x223debf157d54 zio_delta = 0x62e494f1 zio_offset = 0xdb6e8ef000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x84 zio_level = 0x0 zio_blkid = 0x3 time = 0x5c48b4f2 0x427378 eid = 0x6b Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df266686aa00801 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df2666613a vdev_delta_ts = 0x4dcf8511 vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x1808aa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4dabbc02 zio_timestamp = 0x223dec1d3674e zio_delta = 0x6052be22 zio_offset = 0xe2ee3f4000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x86 zio_level = 0x0 zio_blkid = 0x1 time = 0x5c48b4f2 0x427378 eid = 0x6c Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df2666f00000001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df2666613a vdev_delta_ts = 0x4dcf8511 vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x1808aa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4d9c60de zio_timestamp = 0x223dec58e0982 zio_delta = 0x5c8ad773 zio_offset = 0xe31d16f000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x86 zio_level = 0x0 zio_blkid = 0x7 time = 0x5c48b4f2 0x427378 eid = 0x6d Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df2667525300c01 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df26674054 vdev_delta_ts = 0x4d8573ee vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x1808aa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4db6d226 zio_timestamp = 0x223debee4a99b zio_delta = 0x6323f189 zio_offset = 0xdfe4826000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x85 zio_level = 0x0 zio_blkid = 0xc time = 0x5c48b4f2 0x427378 eid = 0x6e Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df2667b9a500001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df26674054 vdev_delta_ts = 0x4d8573ee vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x1808aa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4c91a780 zio_timestamp = 0x223ded1821412 zio_delta = 0x50bf6531 zio_offset = 0x134d6f06000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x80 zio_level = 0x0 zio_blkid = 0x13 time = 0x5c48b4f2 0x427378 eid = 0x6f Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df26680f0b00801 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df2667e6c7 vdev_delta_ts = 0x4a683899 vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x20080caa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4e9a0a40 zio_timestamp = 0x223debf99a79e zio_delta = 0x62347dc3 zio_offset = 0xdfe38a3000 zio_size = 0x5000 time = 0x5c48b4f2 0x427378 eid = 0x70 Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df266a348200001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df2668e237 vdev_delta_ts = 0x4c6f852f vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x1808aa zio_stage = 0x800000 zio_pipeline = 0x840000 zio_delay = 0x4d2e96ba zio_timestamp = 0x223dec06de584 zio_delta = 0x61c6e942 zio_offset = 0x12dbd296000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x85 zio_level = 0x0 zio_blkid = 0xf time = 0x5c48b4f2 0x427378 eid = 0x71 Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df266ac0b900801 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df266ae53b vdev_delta_ts = 0x46ac8a4b vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x3808aa zio_stage = 0x800000 zio_pipeline = 0xb80000 zio_delay = 0x0 zio_timestamp = 0x223debf99a79e zio_delta = 0x0 zio_offset = 0xdfe38a3000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x85 zio_level = 0x0 zio_blkid = 0x0 time = 0x5c48b4f2 0x427378 eid = 0x72 Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df266b1f2d00001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df266ae53b vdev_delta_ts = 0x46ac8a4b vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x3808aa zio_stage = 0x800000 zio_pipeline = 0xb80000 zio_delay = 0x0 zio_timestamp = 0x223debf98bc2b zio_delta = 0x0 zio_offset = 0xdfe38a7000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x85 zio_level = 0x0 zio_blkid = 0x8 time = 0x5c48b4f2 0x427378 eid = 0x73 Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df266bbf5400801 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df266bdd99 vdev_delta_ts = 0x473204a1 vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x3808aa zio_stage = 0x800000 zio_pipeline = 0xb80000 zio_delay = 0x0 zio_timestamp = 0x223debf99d9fa zio_delta = 0x0 zio_offset = 0xdfe38a6000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x85 zio_level = 0x0 zio_blkid = 0x6 time = 0x5c48b4f2 0x427378 eid = 0x74 Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df266cf87100001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df266c79da vdev_delta_ts = 0x45f84f4f vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x3808aa zio_stage = 0x800000 zio_pipeline = 0xb80000 zio_delay = 0x0 zio_timestamp = 0x223debe4672ce zio_delta = 0x0 zio_offset = 0xdfe38a5000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x85 zio_level = 0x0 zio_blkid = 0x3 time = 0x5c48b4f2 0x427378 eid = 0x75 Jan 23 2019 19:39:46.004354936 ereport.fs.zfs.io class = "ereport.fs.zfs.io" ena = 0x3df266d4c2d00c01 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xbb731cc15dc66dac (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xbb731cc15dc66dac vdev_type = "disk" vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_ashift = 0xc vdev_complete_ts = 0x223df266c79da vdev_delta_ts = 0x45f84f4f vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xd948a2922239f661 parent_type = "replacing" vdev_spare_paths = vdev_spare_guids = zio_err = 0x5 zio_flags = 0x3808aa zio_stage = 0x800000 zio_pipeline = 0xb80000 zio_delay = 0x0 zio_timestamp = 0x223debf98ddf0 zio_delta = 0x0 zio_offset = 0xdfe38a4000 zio_size = 0x1000 zio_objset = 0x0 zio_object = 0x85 zio_level = 0x0 zio_blkid = 0x1 time = 0x5c48b4f2 0x427378 eid = 0x76 Jan 23 2019 19:39:47.648325297 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 history_hostname = "the-server" history_internal_str = "replace vdev=/dev/disk/by-id/scsi-SATA_ST8000VN0022-2EL_ZA1DFRP3-part1 for vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T0757951-part1" history_internal_name = "vdev attach" history_txg = 0xce12ae history_time = 0x5c48b4f3 time = 0x5c48b4f3 0x26a4a8b1 eid = 0x77 Jan 23 2019 19:39:56.796160364 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 history_hostname = "the-server" history_internal_str = "errors=0" history_internal_name = "scan aborted, restarting" history_txg = 0xce12af history_time = 0x5c48b4fc time = 0x5c48b4fc 0x2f74716c eid = 0x78 Jan 23 2019 19:39:56.816160003 sysevent.fs.zfs.resilver_start version = 0x0 class = "sysevent.fs.zfs.resilver_start" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 time = 0x5c48b4fc 0x30a59d03 eid = 0x79 Jan 23 2019 19:39:56.816160003 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 history_hostname = "the-server" history_internal_str = "func=2 mintxg=3 maxtxg=13505196" history_internal_name = "scan setup" history_txg = 0xce12af history_time = 0x5c48b4fc time = 0x5c48b4fc 0x30a59d03 eid = 0x7a Jan 23 2019 19:40:01.404077282 resource.fs.zfs.statechange version = 0x0 class = "resource.fs.zfs.statechange" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 vdev_guid = 0xbb731cc15dc66dac vdev_state = "FAULTED" (0x5) vdev_path = "/dev/disk/by-id/scsi-SATA_WDC_WD40EFRX-68N_WD-WCC7K5XYV00C-part1" vdev_laststate = "ONLINE" (0x7) time = 0x5c48b501 0x1815bae2 eid = 0x7b Jan 23 2019 19:40:01.404077282 ereport.fs.zfs.vdev.no_replicas class = "ereport.fs.zfs.vdev.no_replicas" ena = 0x3e2bc4f69de00001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x8a1f5e683ff3ff28 vdev = 0xd948a2922239f661 (end detector) pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 pool_failmode = "wait" vdev_guid = 0xd948a2922239f661 vdev_type = "replacing" vdev_complete_ts = 0x0 vdev_delta_ts = 0x0 vdev_read_errors = 0x0 vdev_write_errors = 0x0 vdev_cksum_errors = 0x0 parent_guid = 0xb05d9fbee1419357 parent_type = "raidz" vdev_spare_paths = vdev_spare_guids = prev_state = 0x6 time = 0x5c48b501 0x1815bae2 eid = 0x7c Jan 23 2019 19:40:16.703801408 sysevent.fs.zfs.config_sync version = 0x0 class = "sysevent.fs.zfs.config_sync" pool = "data_pool" pool_guid = 0x8a1f5e683ff3ff28 pool_state = 0x0 pool_context = 0x0 time = 0x5c48b510 0x29f32840 eid = 0x7d ```
dmesg ``` [602336.506945] sdb: [602337.780622] sdb: sdb1 sdb9 [602337.847438] sdb: sdb1 sdb9 [602358.964698] sdm: [602360.146857] sdm: sdm1 sdm9 [602360.160642] sdm: sdm1 sdm9 [602391.209978] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.209986] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.209990] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.209995] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.209999] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.210003] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.210008] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.210012] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.210016] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.210020] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) [602391.210070] sd 1:0:1:0: [sdb] tag#20 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.210080] sd 1:0:1:0: [sdb] tag#20 CDB: Write(16) 8a 00 00 00 00 00 6f f1 cd 18 00 00 00 28 00 00 [602391.210087] print_req_error: I/O error, dev sdb, sector 1878117656 [602391.213590] sd 1:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.213598] sd 1:0:1:0: [sdb] tag#0 CDB: Write(16) 8a 00 00 00 00 00 6d b7 4f 78 00 00 00 08 00 00 [602391.213604] print_req_error: I/O error, dev sdb, sector 1840729976 [602391.216478] sd 1:0:1:0: [sdb] tag#21 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.216480] sd 1:0:1:0: [sdb] tag#21 CDB: Write(16) 8a 00 00 00 00 00 6f f2 49 30 00 00 00 08 00 00 [602391.216481] print_req_error: I/O error, dev sdb, sector 1878149424 [602391.217426] sd 1:0:1:0: [sdb] tag#2 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.217428] sd 1:0:1:0: [sdb] tag#2 CDB: Write(16) 8a 00 00 00 00 00 71 8e 93 78 00 00 00 08 00 00 [602391.217429] print_req_error: I/O error, dev sdb, sector 1905169272 [602391.218353] sd 1:0:1:0: [sdb] tag#18 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.218364] sd 1:0:1:0: [sdb] tag#18 CDB: Write(16) 8a 00 00 00 00 00 71 77 27 a0 00 00 00 08 00 00 [602391.218366] print_req_error: I/O error, dev sdb, sector 1903634336 [602391.219328] sd 1:0:1:0: [sdb] tag#4 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.219330] sd 1:0:1:0: [sdb] tag#4 CDB: Write(16) 8a 00 00 00 00 00 96 de 9c b0 00 00 00 08 00 00 [602391.219331] print_req_error: I/O error, dev sdb, sector 2531171504 [602391.220231] sd 1:0:1:0: [sdb] tag#6 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.220233] sd 1:0:1:0: [sdb] tag#6 CDB: Write(16) 8a 00 00 00 00 00 9a 6b 80 30 00 00 00 08 00 00 [602391.220234] print_req_error: I/O error, dev sdb, sector 2590736432 [602391.221177] sd 1:0:1:0: [sdb] tag#5 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.221179] sd 1:0:1:0: [sdb] tag#5 CDB: Write(16) 8a 00 00 00 00 00 98 e8 06 50 00 00 00 08 00 00 [602391.221181] print_req_error: I/O error, dev sdb, sector 2565342800 [602391.222124] sd 1:0:1:0: [sdb] tag#3 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.222128] sd 1:0:1:0: [sdb] tag#3 CDB: Write(16) 8a 00 00 00 00 00 71 94 0b 68 00 00 00 08 00 00 [602391.222131] print_req_error: I/O error, dev sdb, sector 1905527656 [602391.223018] sd 1:0:1:0: [sdb] tag#7 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [602391.223021] sd 1:0:1:0: [sdb] tag#7 CDB: Write(16) 8a 00 00 00 00 00 9d c5 35 98 00 00 00 08 00 00 [602391.223024] print_req_error: I/O error, dev sdb, sector 2646947224 [602391.223912] sd 1:0:1:0: Power-on or device reset occurred [602392.131530] sd 1:0:1:0: Power-on or device reset occurred ```
onigoetz commented 5 years ago

I sent an email to the mailing list, sorry for leaving this open so long