WD WDx0EFAX drive unable to resilver

darrenfreeman commented 4 years ago

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	10.3
Linux Kernel	4.19.0-8-amd64
Architecture	AMD Opteron (Bulldozer)
ZFS Version	0.8.0 compiled from git e34c3ee2fcbb320525a110c1811c557bc60479ff

Describe the problem you're observing

I purchased a brand new Western Digital WD40EFAX. Then after I purchased it, this article came out, saying they are likely not suitable for use with zfs, because they fail to resilver: https://blocksandfiles.com/2020/04/14/wd-red-nas-drives-shingled-magnetic-recording/

So to see if this is the case, I did a zpool replace of a known good drive for this new drive. And I can confirm that this drive is unable to resilver. It has effectively hanged at 27%, IOPS fell to zero, and the write error count is going up.

I will be returning this drive as not fit for purpose. I'll leave it trying to resilver for a few days, and so if anyone wants me to do further investigation, I can do this before returning it.

Note that issue #10182 also refers to performance issues caused by this model being SMR masquerading as CMR. The situation now seems far worse than that.

Describe how to reproduce the problem

zpool replace onto a Western Digital WD40EFAX.

Include any warning/errors/backtraces from the system logs

  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Apr 16 01:50:01 2020
        4.92T scanned at 101M/s, 4.69T issued at 96.7M/s, 17.0T total
        947G resilvered, 27.61% done, 1 days 13:03:18 to go
config:

        NAME             STATE     READ WRITE CKSUM
        rpool            ONLINE       0     0     0
          raidz1-0       ONLINE       0     0     0
            rpool0       ONLINE       0     0     0
            replacing-1  ONLINE       0     0     0
              rpool1     ONLINE       0     0     0
              rpool5     ONLINE       0 47.7K     0  (resilvering)
            rpool2       ONLINE       0     0     0
            rpool3       ONLINE       0     0     0
            rpool4       ONLINE       0     0     0

errors: No known data errors

Apr 16 16:23:27 host kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Apr 16 16:23:27 host kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2190 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Apr 16 16:23:27 host kernel: print_req_error: critical target error, dev sdj, sector 5536725488
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2190 CDB: Write(16) 8a 00 00 00 00 01 4a 03 be 48 00 00 00 08 00 00
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2274 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Apr 16 16:23:27 host kernel: print_req_error: I/O error, dev sdj, sector 5536726600
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2117 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Apr 16 16:23:27 host kernel: print_req_error: I/O error, dev sdj, sector 5536724648
Apr 16 16:23:27 host kernel: zio pool=rpool vdev=/dev/disk/by-partlabel/rpool5 error=5 type=2 offset=2815922311168 size=4096 flags=1808aa
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2246 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2117 CDB: Write(16) 8a 00 00 00 00 01 4a 03 bc 40 00 00 00 08 00 00
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2703 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Apr 16 16:23:27 host kernel: zio pool=rpool vdev=/dev/disk/by-partlabel/rpool5 error=5 type=2 offset=2815921311744 size=4096 flags=1808aa
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2131 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Apr 16 16:23:27 host kernel: print_req_error: I/O error, dev sdj, sector 5536726080
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2246 CDB: Write(16) 8a 00 00 00 00 01 4a 03 b8 18 00 00 00 08 00 00
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2703 CDB: Write(16) 8a 00 00 00 00 01 4a 03 b7 98 00 00 00 08 00 00
Apr 16 16:23:27 host kernel: print_req_error: I/O error, dev sdj, sector 5536725016
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2189 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Apr 16 16:23:27 host kernel: zio pool=rpool vdev=/dev/disk/by-partlabel/rpool5 error=5 type=2 offset=2815922044928 size=4096 flags=1808aa
Apr 16 16:23:27 host kernel: sd 7:0:6:0: [sdj] tag#2131 CDB: Write(16) 8a 00 00 00 00 01 4a 03 b8 b0 00 00 00 10 00 00
Apr 16 16:23:27 host kernel: print_req_error: I/O error, dev sdj, sector 5536724888

shodanshok commented 4 years ago

Thanks for reporting. Can you show the smartctl --all output?

fazalmajid commented 4 years ago

Here are some more details, the WD drives may also have a firmware bug:

https://blocksandfiles.com/2020/04/15/shingled-drives-have-non-shingled-zones-for-caching-writes/

lord-carlos commented 4 years ago

Here are some more details, the WD drives may also have a firmware bug:

https://blocksandfiles.com/2020/04/15/shingled-drives-have-non-shingled-zones-for-caching-writes/

Perticuklar the last part is important for people who want to resilver:

I was able to force the WD40EFAX to resilver – by switching off write caching and lookahead. This dropped the drive’s write speed to less than 6MB/sec and the resilvering took 8 days instead of the more usual 24 hours. More worryingly, once added, a ZFS SCRUB (RAID integrity check) has yet to successfully complete without that drive producing checksum errors, even after 5 days of trying.

Stoatwblr commented 4 years ago

That last entry is mine.

This is definitely a WD firmware bug. 3 different drives showing the same errors at more or less the same points in the resilvering process is pretty damning.

I strongly suggest that people DO NOT attempt to replicate my attempts to force a drive into the array to see what happens next. I'm still cleaning up the mess. ZFS kicked it out for good reason.

WD are (of course) continuing to deny there's a problem or that they have firmware issues.

Stoatwblr commented 4 years ago

If you wish to see the drive's internal error log, use smartcl --xall (or smartctl -x)

The extended GP log will not show up with the --all (-a) flag

darrenfreeman commented 4 years ago

smartctl --xall /dev/sdj
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-8-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD40EFAX-68JH4N0
Serial Number:    WD-...
LU WWN Device Id: ...
Firmware Version: 82.00A82
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Apr 17 12:44:20 2020 AEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (21404) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 525) minutes.
Conveyance self-test routine
recommended polling time:        (   3) minutes.
SCT capabilities:              (0x3039) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   100   253   021    -    0
  4 Start_Stop_Count        -O--CK   100   100   000    -    1
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    540
 10 Spin_Retry_Count        -O--CK   100   253   000    -    0
 11 Calibration_Retry_Count -O--CK   100   253   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    1
192 Power-Off_Retract_Count -O--CK   200   200   000    -    0
193 Load_Cycle_Count        -O--CK   200   200   000    -    7
194 Temperature_Celsius     -O---K   126   113   000    -    21
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O   2048  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x24       GPL     R/O    294  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb6  GPL,SL  VS       1  Device vendor specific log
0xb7       GPL,SL  VS      78  Device vendor specific log
0xb9       GPL,SL  VS       4  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 7487 (device log contains only the most recent 24 errors)
        CR     = Command Register
        FEATR  = Features Register
        COUNT  = Count (was: Sector Count) Register
        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
        LH     = LBA High (was: Cylinder High) Register    ]   LBA
        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
        LL     = LBA Low (was: Sector Number) Register     ]
        DV     = Device (was: Device/Head) Register
        DC     = Device Control Register
        ER     = Error register
        ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 7487 [22] occurred at disk power-on lifetime: 540 hours (22 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 d1 c0 bc 10 40 00  Error: IDNF at LBA = 0x1d1c0bc10 = 7814036496

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 38 00 20 00 01 a8 8f ca 78 40 00 22d+12:23:01.073  WRITE FPDMA QUEUED
  61 00 18 00 18 00 01 a8 8f ca b8 40 00 22d+12:23:01.073  WRITE FPDMA QUEUED
  61 00 10 00 10 00 00 02 32 ba 10 40 00 22d+12:23:01.073  WRITE FPDMA QUEUED
  61 00 10 00 08 00 01 d1 c0 bc 10 40 00 22d+12:23:01.073  WRITE FPDMA QUEUED
  61 00 10 00 00 00 01 d1 c0 ba 10 40 00 22d+12:23:01.073  WRITE FPDMA QUEUED

Error 7486 [21] occurred at disk power-on lifetime: 540 hours (22 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 a8 8f c9 78 40 00  Error: IDNF at LBA = 0x1a88fc978 = 7122962808

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 58 00 00 00 01 a8 8f c9 98 40 00 22d+12:22:53.109  WRITE FPDMA QUEUED
  61 00 18 00 20 00 01 a5 64 0d e8 40 00 22d+12:22:52.710  WRITE FPDMA QUEUED
  61 00 10 00 18 00 01 a5 64 0d d8 40 00 22d+12:22:52.383  WRITE FPDMA QUEUED
  61 00 10 00 10 00 01 a5 64 0d c8 40 00 22d+12:22:52.368  WRITE FPDMA QUEUED
  61 00 18 00 08 00 01 a8 8f c9 78 40 00 22d+12:22:52.368  WRITE FPDMA QUEUED

Error 7485 [20] occurred at disk power-on lifetime: 540 hours (22 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 a8 8e f9 78 40 00  Error: IDNF at LBA = 0x1a88ef978 = 7122909560

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 38 00 00 00 01 a8 8e f9 98 40 00 22d+12:22:15.406  WRITE FPDMA QUEUED
  61 00 40 00 38 00 01 a5 64 07 e8 40 00 22d+12:22:15.152  WRITE FPDMA QUEUED
  61 00 40 00 30 00 01 a5 64 07 a8 40 00 22d+12:22:14.895  WRITE FPDMA QUEUED
  61 00 20 00 28 00 01 a8 fc 7f 68 40 00 22d+12:22:14.895  WRITE FPDMA QUEUED
  61 00 28 00 20 00 01 a8 fc 7f 40 40 00 22d+12:22:14.895  WRITE FPDMA QUEUED

Error 7484 [19] occurred at disk power-on lifetime: 540 hours (22 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 80 96 22 e8 40 00  Error: IDNF at LBA = 0x1809622e8 = 6452290280

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 10 00 18 00 01 d1 c0 bc 10 40 00 22d+12:21:34.814  WRITE FPDMA QUEUED
  61 00 10 00 10 00 01 d1 c0 ba 10 40 00 22d+12:21:34.603  WRITE FPDMA QUEUED
  61 00 10 00 08 00 00 02 32 ba 10 40 00 22d+12:21:34.406  WRITE FPDMA QUEUED
  61 00 08 00 00 00 01 80 96 27 10 40 00 22d+12:21:34.362  WRITE FPDMA QUEUED
  61 00 08 00 20 00 01 80 96 22 e8 40 00 22d+12:21:33.737  WRITE FPDMA QUEUED

Error 7483 [18] occurred at disk power-on lifetime: 540 hours (22 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 a9 39 f2 e0 40 00  Error: IDNF at LBA = 0x1a939f2e0 = 7134114528

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 08 00 00 00 01 a9 3d 99 60 40 00 22d+12:21:26.189  WRITE FPDMA QUEUED
  61 00 08 00 08 00 01 a9 39 f2 e0 40 00 22d+12:21:26.189  WRITE FPDMA QUEUED
  61 00 08 00 00 00 01 a9 38 d8 10 40 00 22d+12:21:26.189  WRITE FPDMA QUEUED
  61 00 08 00 08 00 01 a9 38 81 68 40 00 22d+12:21:26.189  WRITE FPDMA QUEUED
  61 00 08 00 00 00 01 a8 d2 c5 30 40 00 22d+12:21:26.188  WRITE FPDMA QUEUED

Error 7482 [17] occurred at disk power-on lifetime: 540 hours (22 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 a8 8d 5f d8 40 00  Error: IDNF at LBA = 0x1a88d5fd8 = 7122804696

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 98 00 20 00 01 a8 8d 5f d8 40 00 22d+12:20:49.161  WRITE FPDMA QUEUED
  60 00 10 00 18 00 01 d1 c0 bc 10 40 00 22d+12:20:49.159  READ FPDMA QUEUED
  60 00 10 00 10 00 01 d1 c0 ba 10 40 00 22d+12:20:49.159  READ FPDMA QUEUED
  60 00 10 00 08 00 00 02 32 ba 10 40 00 22d+12:20:49.159  READ FPDMA QUEUED
  61 00 08 00 00 00 01 a8 8c 90 e8 40 00 22d+12:20:49.159  WRITE FPDMA QUEUED

Error 7481 [16] occurred at disk power-on lifetime: 540 hours (22 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 a8 8d 5f 98 40 00  Error: IDNF at LBA = 0x1a88d5f98 = 7122804632

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 10 00 18 00 01 a5 64 05 00 40 00 22d+12:20:46.550  WRITE FPDMA QUEUED
  61 00 18 00 10 00 01 a8 fa 57 60 40 00 22d+12:20:46.534  WRITE FPDMA QUEUED
  61 00 18 00 08 00 01 a8 8d 5f b8 40 00 22d+12:20:42.224  WRITE FPDMA QUEUED
  61 00 18 00 00 00 01 a8 8d 5f 98 40 00 22d+12:20:42.223  WRITE FPDMA QUEUED
  61 00 18 00 08 00 01 a8 8d 5f 78 40 00 22d+12:20:42.222  WRITE FPDMA QUEUED

Error 7480 [15] occurred at disk power-on lifetime: 540 hours (22 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 00 00 00 01 d1 c0 bc 10 40 00  Error: IDNF at LBA = 0x1d1c0bc10 = 7814036496

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 18 00 20 00 01 a5 64 03 58 40 00 22d+12:19:13.755  WRITE FPDMA QUEUED
  61 00 28 00 18 00 01 a8 f8 79 58 40 00 22d+12:19:13.745  WRITE FPDMA QUEUED
  61 00 20 00 00 00 01 a8 f8 7a d0 40 00 22d+12:19:13.743  WRITE FPDMA QUEUED
  61 00 10 00 10 00 00 02 32 ba 10 40 00 22d+12:19:08.985  WRITE FPDMA QUEUED
  61 00 10 00 08 00 01 d1 c0 bc 10 40 00 22d+12:19:08.985  WRITE FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    21 Celsius
Power Cycle Min/Max Temperature:     16/31 Celsius
Lifetime    Min/Max Temperature:     16/31 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/65 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (46)

Index    Estimated Time   Temperature Celsius
  47    2020-04-17 04:47    26  *******
  48    2020-04-17 04:48    26  *******
  49    2020-04-17 04:49    26  *******
  50    2020-04-17 04:50    27  ********
  51    2020-04-17 04:51    26  *******
 ...    ..(  6 skipped).    ..  *******
  58    2020-04-17 04:58    26  *******
  59    2020-04-17 04:59    25  ******
 ...    ..(  9 skipped).    ..  ******
  69    2020-04-17 05:09    25  ******
  70    2020-04-17 05:10    24  *****
 ...    ..(115 skipped).    ..  *****
 186    2020-04-17 07:06    24  *****
 187    2020-04-17 07:07    25  ******
 ...    ..( 12 skipped).    ..  ******
 200    2020-04-17 07:20    25  ******
 201    2020-04-17 07:21    26  *******
 ...    ..(  7 skipped).    ..  *******
 209    2020-04-17 07:29    26  *******
 210    2020-04-17 07:30    27  ********
 ...    ..( 77 skipped).    ..  ********
 288    2020-04-17 08:48    27  ********
 289    2020-04-17 08:49    26  *******
 ...    ..( 11 skipped).    ..  *******
 301    2020-04-17 09:01    26  *******
 302    2020-04-17 09:02    25  ******
 ...    ..( 11 skipped).    ..  ******
 314    2020-04-17 09:14    25  ******
 315    2020-04-17 09:15    24  *****
 ...    ..( 19 skipped).    ..  *****
 335    2020-04-17 09:35    24  *****
 336    2020-04-17 09:36    23  ****
 ...    ..( 19 skipped).    ..  ****
 356    2020-04-17 09:56    23  ****
 357    2020-04-17 09:57    24  *****
 ...    ..( 11 skipped).    ..  *****
 369    2020-04-17 10:09    24  *****
 370    2020-04-17 10:10    25  ******
 371    2020-04-17 10:11    24  *****
 ...    ..(  4 skipped).    ..  *****
 376    2020-04-17 10:16    24  *****
 377    2020-04-17 10:17    23  ****
 ...    ..(  9 skipped).    ..  ****
 387    2020-04-17 10:27    23  ****
 388    2020-04-17 10:28    24  *****
 ...    ..( 15 skipped).    ..  *****
 404    2020-04-17 10:44    24  *****
 405    2020-04-17 10:45    25  ******
 406    2020-04-17 10:46    24  *****
 ...    ..( 13 skipped).    ..  *****
 420    2020-04-17 11:00    24  *****
 421    2020-04-17 11:01    25  ******
 422    2020-04-17 11:02    24  *****
 ...    ..( 16 skipped).    ..  *****
 439    2020-04-17 11:19    24  *****
 440    2020-04-17 11:20    23  ****
 ...    ..(  5 skipped).    ..  ****
 446    2020-04-17 11:26    23  ****
 447    2020-04-17 11:27    22  ***
 448    2020-04-17 11:28    23  ****
 449    2020-04-17 11:29    23  ****
 450    2020-04-17 11:30    23  ****
 451    2020-04-17 11:31    22  ***
 452    2020-04-17 11:32    23  ****
 453    2020-04-17 11:33    23  ****
 454    2020-04-17 11:34    23  ****
 455    2020-04-17 11:35    24  *****
 456    2020-04-17 11:36    23  ****
 457    2020-04-17 11:37    23  ****
 458    2020-04-17 11:38    24  *****
 459    2020-04-17 11:39    24  *****
 460    2020-04-17 11:40    24  *****
 461    2020-04-17 11:41    23  ****
 462    2020-04-17 11:42    24  *****
 ...    ..( 17 skipped).    ..  *****
   2    2020-04-17 12:00    24  *****
   3    2020-04-17 12:01    23  ****
 ...    ..( 12 skipped).    ..  ****
  16    2020-04-17 12:14    23  ****
  17    2020-04-17 12:15    22  ***
  18    2020-04-17 12:16    23  ****
 ...    ..(  5 skipped).    ..  ****
  24    2020-04-17 12:22    23  ****
  25    2020-04-17 12:23    24  *****
  26    2020-04-17 12:24    24  *****
  27    2020-04-17 12:25    23  ****
  28    2020-04-17 12:26    19  -
  29    2020-04-17 12:27    19  -
  30    2020-04-17 12:28    21  **
  31    2020-04-17 12:29    21  **
  32    2020-04-17 12:30    22  ***
 ...    ..(  9 skipped).    ..  ***
  42    2020-04-17 12:40    22  ***
  43    2020-04-17 12:41    23  ****
  44    2020-04-17 12:42    23  ****
  45    2020-04-17 12:43    23  ****
  46    2020-04-17 12:44    21  **

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) supported [please try: '-l defects']

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           82  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           83  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4      1945390  Vendor specific

darrenfreeman commented 4 years ago

It's worth noting that I usually fill a brand new drive with pseudorandom data, and then verify that it reads back correctly, before zeroing it and using it. I believe I did this, and of course I didn't know to TRIM it afterwards.

darrenfreeman commented 4 years ago

The write error count, as reported by zpool status, keeps dropping to lower numbers. I think it's probably overflowing at 65.5K errors, is that correct?

Stoatwblr commented 4 years ago

Trimming the drive won't make any difference. If anything a Trimmed or virgin drive exhibits more errors than a zero-filled one. (The advantage of having 3 such drives is you can verify this behaviour and know it's not "a bad drive" - and that it's not "excess vibration", as WD tried to paint it, by putting the drives on individual foam blocks out of the chassis...)

darrenfreeman commented 4 years ago

Given all the IDNF errors that it's getting right now, I probably didn't test the drive and zero it. Failing at 28% likely corresponds to the amount that I previously wrote.

darrenfreeman commented 4 years ago

I zeroed the partition, and tried doing the replace / resilver again. It's failed in the same way, at about 26%. The errors are still IDNF.

drescherjm commented 4 years ago

I would take the drive out of the array and see if it passes a 4 pass data destructive badblocks test. This should take roughly 40 to 60 hours to complete on a 4TB drive.

darrenfreeman commented 4 years ago

I filled the partition with pseudorandom data, and then tried the replace/resilver again. (In case the drive is doing something like a TRIM when you simply zero it.) But that didn't help, it's failed at 27%, spewing write errors as before.

I will now do a four pass destructive badblocks test on the full drive, no partitions.

darrenfreeman commented 4 years ago

badblocks -wsv -b 4096 -o badblocks.txt /dev/sdj

98.30% done, 65:08:33 elapsed. (0/0/0 errors)

Pass completed, 0 bad blocks found. (0/0/0 errors)

Read/write speeds were similar, starting the pass at 180 MB/s, and gradually dropping to 80 MB/s by the end of the pass.

Nabruu commented 4 years ago

How about vibrations? If the pool was idling, that could explain the success of badblocks. I've experienced this with WD Reds and non-server chassis.

darrenfreeman commented 4 years ago

The pool is in a Supermicro server, so I doubt vibration has anything to do with it. Older WD Reds have no issues.

I bought a new Seagate IronWolf ST4000VN008-2DR166 for comparison. Running through identical steps, first is a long self-test, then I'll fill it with pseudorandom data and read it back, then I'll zero it and create a labelled GPT partition, before trying to resilver as before.

shodanshok commented 4 years ago

The ZFS resilver process is not completely sequential, and I think the random part is what brings the SMR WD RED to its knees: the drive firmware evidently has issues writing on different CMR/SMR bands, causing tremendous latency and, finally, a dropped drive.

A pure sequential fill workload should not put this kind of stress on a SMR driver, resulting in no apparent problems and even showing quite good results (ie: the 180-80 MB/s transfer above).

Stoatwblr commented 4 years ago

In this instance it's a drive firmware issue and I think ZFS is doing the right thing.

I've been seeing this too on 3 different drives - it's not a vibration issue. I verified that by isolating the drives on blocks of foam.

Based on Manfred Berger's 2015 OpenZFS Paris SMR presentation and some reading of SMR documentation, I believe the IDNF is a "relatively normal" error for SMR drives that occurs as the head seeks to EOD in SMR zones (just past end of written data) when switching between SMR zones (there are a number of open zones on a DM-SMR drive) and should be masked by the drive - except that WD screwed up.

WD have just issued a new blog update: https://blog.westerndigital.com/wd-red-nas-drives/ (2020/4/22) and given a list of SMR drives, with more details coming.

I've been dealing with WD over this issue for a few weeks and I suspect their sudden change of stance has something to do with my pointing out that there have been a number of consumer protection agency complaints filed around the world over the last week since stories appeared about them putting DM-SMR drives into the channel(*) whilst blowing off user complaints and passing on a UK legal opinion yesterday that in a british court they'd most likely to be found to have have acted with misfeasance, not just negligently (IANAL, but a friend is a magistrate) along with an observation that the continued gaslighting of customers they've been posting is likely to compound the penalties in judgements against them, especially as Seagate issued a statement that SMR and NAS don't mix: https://arstechnica.com/information-technology/2020/04/seagate-says-network-attached-storage-and-smr-dont-mix/, their own documentation says much the same thing in various locations http://zonedstorage.io/introduction/smr/ and Manfred Berger stated it in his presentation too - https://www.youtube.com/watch?v=a2lnMxMUxyc - around the 13 minute mark.

Under the law in most countries, if you advertise something as a NAS/RAID drive and it won't work as a NAS/RAID drive, the that's a very bad thing (unfit for the purpose for which it was marketed) - especially if you change it without notice from a device that worked perfectly to something that doesn't work - or when it does work, it may take a 90-98% impact in performance from the previous device if you don't treat it "just right" (RAIDZ3 resilvers going from 12-20 hours on 19*2TB drives out to 8-10 DAYS, and scrubs failing to complete or taking 2 weeks - all on idle vdevs) - if you then issue the kinds of statements that are in the rest of that WD blog, the courts really won't like it - and when it's being punted as a home/SOHO device then consumer protection laws kick in falrly hard.

(*) https://arstechnica.com/gadgets/2020/04/caveat-emptor-smr-disks-are-being-submarined-into-unexpected-channels/ https://blocksandfiles.com/2020/04/14/wd-red-nas-drives-shingled-magnetic-recording/ https://blocksandfiles.com/2020/04/15/seagate-2-4-and-8tb-barracuda-and-desktop-hdd-smr/ https://blocksandfiles.com/2020/04/16/toshiba-desktop-disk-drives-undocumented-shingle-magnetic-recording/ https://blocksandfiles.com/2020/04/23/western-digital-blog-wd-red-nas-smr-drives-overuse/

darrenfreeman commented 4 years ago

Finished resilvering my brand new IronWolf ST4000VN008. No surprises whatsoever.

  pool: rpool
 state: ONLINE
  scan: resilvered 3.38T in 0 days 13:51:05 with 0 errors on Sat Apr 25 23:16:10 2020
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            rpool0  ONLINE       0     0     0
            rpool5  ONLINE       0     0     0
            rpool2  ONLINE       0     0     0
            rpool3  ONLINE       0     0     0
            rpool4  ONLINE       0     0     0

errors: No known data errors

darrenfreeman commented 4 years ago

Scrub completed without any errors.

darrenfreeman commented 4 years ago

After the resilver and the scrub, I then took the new IronWolf offline, and did a sequential copy to the Red EFAX, using dd.

I was unable do a zpool replace -f, due to issues #3242 and #7866. However I created a symlink from the old device name, pointing to the new name, and then zpool online succeeded. (Next time, I would just change the partition label.)

During the resilver, the EFAX threw a couple of write errors, which from the kernel log, appeared to be the same IDNF error. It's worth noting that in all the previous configurations, the EFAX was on an IBM SAS2008 card, but now it is the only device on the motherboard's SATA controller, due to a shortage of cables.

Apr 29 01:07:35 host kernel: sd 1:0:0:0: [sdm] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 29 01:07:35 host kernel: sd 1:0:0:0: [sdm] tag#27 Sense Key : Illegal Request [current] 
Apr 29 01:07:35 host kernel: sd 1:0:0:0: [sdm] tag#27 Add. Sense: Logical block address out of range
Apr 29 01:07:35 host kernel: sd 1:0:0:0: [sdm] tag#27 CDB: Write(16) 8a 00 00 00 00 00 82 5e 66 28 00 00 00 10 00 00
Apr 29 01:07:35 host kernel: print_req_error: I/O error, dev sdm, sector 2187224616
Apr 29 01:11:35 host kernel: sd 1:0:0:0: [sdm] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 29 01:11:35 host kernel: sd 1:0:0:0: [sdm] tag#19 Sense Key : Illegal Request [current] 
Apr 29 01:11:35 host kernel: sd 1:0:0:0: [sdm] tag#19 Add. Sense: Logical block address out of range
Apr 29 01:11:35 host kernel: sd 1:0:0:0: [sdm] tag#19 CDB: Write(16) 8a 00 00 00 00 00 82 61 ab e8 00 00 00 08 00 00
Apr 29 01:11:35 host kernel: print_req_error: I/O error, dev sdm, sector 2187439080

  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 7.26G in 0 days 00:04:24 with 0 errors on Wed Apr 29 01:18:52 2020
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          raidz1-0    ONLINE       0     0     0
            rpool0    ONLINE       0     0     0
            rpool5    ONLINE       0     2     0  
            rpool2    ONLINE       0     0     0
            rpool3    ONLINE       0     0     0
            rpool4    ONLINE       0     0     0

errors: No known data errors

Finally, I did a scrub, and the scrub has completed without any errors.

This is hardly a nice solution to getting an EFAX resilvered, as it's time-consuming, and it costs you one drive worth of redundancy. But for anyone who is desperate enough to try, I can say that it does work. (Try to limit the amount of writes done to the pool during this process, in case the final resilver fails as before.)

yohgaki commented 4 years ago

FYI

TOSHIBA MN04 HDD (includes RV sensor) seems to have similar problem.

I've tried to replace MD04 series HDD (TOSHIBA) by MN04 series HDD, and got many read/write errors. I bought 2 MN04 drives and got similar result for both drives. No smart error is recorded. I haven't tried to find new firmware yet.

 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1.00: exception Emask 0x50 SAct 0x80800 SErr 0x4090800 action 0xe frozen
 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1.00: irq_stat 0x00400040, connection status changed
 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1: SError: { HostInt PHYRdyChg 10B8B DevExch }
 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1.00: failed command: READ FPDMA QUEUED
 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1.00: cmd 60/d0:58:48:93:ff/07:00:10:00:00/40 tag 11 ncq dma 1024000 in
                                                     res 40/00:98:18:9b:ff/00:00:10:00:00/40 Emask 0x50 (ATA bus error)
 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1.00: status: { DRDY }
 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1.00: failed command: READ FPDMA QUEUED
 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1.00: cmd 60/f0:98:18:9b:ff/07:00:10:00:00/40 tag 19 ncq dma 1040384 in
                                                     res 40/00:98:18:9b:ff/00:00:10:00:00/40 Emask 0x50 (ATA bus error)
 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1.00: status: { DRDY }
 5月 23 06:57:17 dev.inter.es-i.jp kernel: ata1: hard resetting link
 5月 23 06:57:20 dev.inter.es-i.jp kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
 5月 23 06:57:20 dev.inter.es-i.jp kernel: ata1.00: configured for UDMA/100
 5月 23 06:57:20 dev.inter.es-i.jp kernel: sd 0:0:0:0: [sda] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
 5月 23 06:57:20 dev.inter.es-i.jp kernel: sd 0:0:0:0: [sda] tag#11 Sense Key : Illegal Request [current] 
 5月 23 06:57:20 dev.inter.es-i.jp kernel: sd 0:0:0:0: [sda] tag#11 Add. Sense: Unaligned write command
 5月 23 06:57:20 dev.inter.es-i.jp kernel: sd 0:0:0:0: [sda] tag#11 CDB: Read(16) 88 00 00 00 00 00 10 ff 93 48 00 00 07 d0 00 00
 5月 23 06:57:20 dev.inter.es-i.jp kernel: blk_update_request: I/O error, dev sda, sector 285184840 op 0x0:(READ) flags 0x700 phys_seg 17 prio class 0
 5月 23 06:57:20 dev.inter.es-i.jp kernel: zio pool=datastore vdev=/dev/disk/by-id/ata-TOSHIBA_MN04ACA400_4071K3DXF7ME-part1 error=5 type=1 offset=146013589504 size=1024000 flags=40080ca8
 5月 23 06:57:20 dev.inter.es-i.jp kernel: sd 0:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
 5月 23 06:57:20 dev.inter.es-i.jp kernel: sd 0:0:0:0: [sda] tag#19 Sense Key : Illegal Request [current] 
 5月 23 06:57:20 dev.inter.es-i.jp kernel: sd 0:0:0:0: [sda] tag#19 Add. Sense: Unaligned write command
 5月 23 06:57:20 dev.inter.es-i.jp kernel: sd 0:0:0:0: [sda] tag#19 CDB: Read(16) 88 00 00 00 00 00 10 ff 9b 18 00 00 07 f0 00 00
 5月 23 06:57:20 dev.inter.es-i.jp kernel: blk_update_request: I/O error, dev sda, sector 285186840 op 0x0:(READ) flags 0x700 phys_seg 18 prio class 0
 5月 23 06:57:20 dev.inter.es-i.jp kernel: zio pool=datastore vdev=/dev/disk/by-id/ata-TOSHIBA_MN04ACA400_4071K3DXF7ME-part1 error=5 type=1 offset=146014613504 size=1040384 flags=40080ca8
 5月 23 06:57:20 dev.inter.es-i.jp kernel: ata1: EH complete
 5月 23 06:57:21 dev.inter.es-i.jp zed[698351]: eid=712 class=io pool_guid=0xE00223B81DC49FEF vdev_path=/dev/disk/by-id/ata-TOSHIBA_MN04ACA400_4071K3DXF7ME-part1
 5月 23 06:57:21 dev.inter.es-i.jp zed[698357]: eid=713 class=io pool_guid=0xE00223B81DC49FEF vdev_path=/dev/disk/by-id/ata-TOSHIBA_MN04ACA400_4071K3DXF7ME-part1
 5月 23 06:57:21 dev.inter.es-i.jp zed[699834]: eid=714 class=io pool_guid=0xE00223B81DC49FEF vdev_path=/dev/disk/by-id/ata-TOSHIBA_MN04ACA400_4071K3DXF7ME-part1

darrenfreeman commented 4 years ago

I've also found this issue with a WD Blue WD20SPZX, which is a 2.5" 2TB drive that I have in a laptop.

One day I attempted a transfer of ~500 GB of files to an NTFS partition under Windows, which got slower and slower until I killed the transfer. After rebooting to Ubuntu, it then also generated lengthy timeouts with zfs. This was a near new drive, so I thought I must have a faulty SATA controller. I mirrored the zpool with an external USB3 drive, and then dropped it out of the mirror.

Now that I know about this issue with WD drives, I can confirm the internal drive is behaving just like the WD40EFAX, it seems to not be faulty, but it won't resilver into the mirror. It eventually throws thousands of write errors, until zfs faults it.

QBANIN commented 4 years ago

I have 4 WDC_WD30EFRX installed in HP Microserver Gen8 connected to motherboard sata controller configured in AHCI mode. At first there were 2 drives in ZFS Mirror configuration and everything was fine. After few months I added 2 more drives and upgraded mirror to RAID10. Since then I'm getting random write errors, most likely during heavy write load. Scrub finishes without any error. No SMART errors.

[213957.995776] ata3.00: exception Emask 0x0 SAct 0xc000000 SErr 0x0 action 0x0 [213957.996438] ata3.00: irq_stat 0x40000008 [213957.996822] ata3.00: failed command: WRITE FPDMA QUEUED [213957.997206] ata3.00: cmd 61/00:d0:f8:49:18/08:00:e3:00:00/40 tag 26 ncq dma 1048576 ou res 41/10:00:f8:49:18/00:00:e3:00:00/40 Emask 0x481 (invalid argument)
[213957.998508] ata3.00: status: { DRDY ERR } [213957.998880] ata3.00: error: { IDNF } [213958.000293] ata3.00: configured for UDMA/133 [213958.000689] sd 3:0:0:0: [sdg] tag#26 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [213958.001381] sd 3:0:0:0: [sdg] tag#26 Sense Key : Illegal Request [current] [213958.002070] sd 3:0:0:0: [sdg] tag#26 Add. Sense: Logical block address out of range [213958.002759] sd 3:0:0:0: [sdg] tag#26 CDB: Write(16) 8a 00 00 00 00 00 e3 18 49 f8 00 00 08 00 00 00 [213958.003455] blk_update_request: I/O error, dev sdg, sector 3810019832 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0
[213958.004180] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4TKRDJR-part1 error=5 type=2 offset=1950729105408 size=1048576 flags=40080c80 [213958.005199] ata3: EH complete [213958.016612] ata1.00: exception Emask 0x0 SAct 0x280 SErr 0x0 action 0x0 [213958.017262] ata1.00: irq_stat 0x40000008 [213958.017612] ata1.00: failed command: WRITE FPDMA QUEUED [213958.017968] ata1.00: cmd 61/00:38:00:f4:da/08:00:4b:00:00/40 tag 7 ncq dma 1048576 ou res 41/10:00:00:f4:da/00:00:4b:00:00/40 Emask 0x481 (invalid argument)
[213958.019218] ata1.00: status: { DRDY ERR } [213958.019581] ata1.00: error: { IDNF } [213958.019944] ata4.00: exception Emask 0x0 SAct 0x180 SErr 0x0 action 0x0 [213958.020621] ata4.00: irq_stat 0x40000008 [213958.020997] ata4.00: failed command: WRITE FPDMA QUEUED [213958.021371] ata4.00: cmd 61/00:38:f8:52:18/08:00:e3:00:00/40 tag 7 ncq dma 1048576 ou res 41/10:00:f8:52:18/00:00:e3:00:00/40 Emask 0x481 (invalid argument)
[213958.022700] ata4.00: status: { DRDY ERR } [213958.023071] ata4.00: error: { IDNF } [213958.025547] ata1.00: configured for UDMA/133 [213958.025959] ata4.00: configured for UDMA/133 [213958.025963] sd 1:0:0:0: [sde] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [213958.026338] sd 4:0:0:0: [sdh] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [213958.027017] sd 1:0:0:0: [sde] tag#7 Sense Key : Illegal Request [current] [213958.027718] sd 4:0:0:0: [sdh] tag#7 Sense Key : Illegal Request [current] [213958.028392] sd 1:0:0:0: [sde] tag#7 Add. Sense: Logical block address out of range [213958.029075] sd 4:0:0:0: [sdh] tag#7 Add. Sense: Logical block address out of range [213958.029078] sd 4:0:0:0: [sdh] tag#7 CDB: Write(16) 8a 00 00 00 00 00 e3 18 52 f8 00 00 08 00 00 00 [213958.029767] sd 1:0:0:0: [sde] tag#7 CDB: Write(16) 8a 00 00 00 00 00 4b da f4 00 00 00 08 00 00 00 [213958.030456] blk_update_request: I/O error, dev sdh, sector 3810022136 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0
[213958.031145] blk_update_request: I/O error, dev sde, sector 1272640512 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0
[213958.031840] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6EVNSY9-part1 error=5 type=2 offset=1950730285056 size=1048576 flags=40080c80 [213958.031849] ata4: EH complete [213958.034629] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0505144-part1 error=5 type=2 offset=651590893568 size=1048576 flags=40080c80

darrenfreeman commented 4 years ago

I'm surprised to be seeing those IDNF errors alongside WD30EFRX. WD are maintaining that these are not SMR. See e.g. https://www.westerndigital.com/products/internal-drives/wd-red-hdd

Do you have any other drive model in that array?

I have three WD40EFRX Red NAS drives in mine, and they have performed without issue. All with firmware 82.00A82.

I was surprised to find that the WD40EFAX drive also had firmware 82.00A82, which leads me to wonder whether the DM-SMR functionality has been in the firmware all along, but they only just enabled it through the configuration data in those particular drives.

Can you find any difference between your old and new drives? Try hdparm -I and see whether you see entries like:

           *    Data Set Management TRIM supported (limit 10 blocks)
           *    Deterministic read ZEROs after TRIM

QBANIN commented 4 years ago

Hi, I have only WD30EFRX in this RAID10 array. 2 older drives (few years old) are:

WDC WD30EFRX-68EUZN0
Serial Number: WD-WMC4N0505144 Firmware Revision: 80.00A80

looks like its older hardware rev. and the lastest firmware is 80.00A80

and the other 2 installed recently are

Model Number: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4N4TKRDJR Firmware Revision: 82.00A82

The output of hdparm -I /dev/sd[e-h] |grep TRIM is empty, so this is not SMR model.

I have more drives in different arrays and I never had a single issue with them, but those are connected to HBA LSI controller. I thing I noticed after posting previous comment that 2 older drives had built-in APM timer value set to 300s (checked with idle3-tools and set to disabled) and the other 2 had APM disabled by default.

I don't know what is the origin of this issue but it affects randomly one or all drives in this array

shodanshok commented 4 years ago

I have 4 WDC_WD30EFRX installed in HP Microserver Gen8 connected to motherboard sata controller configured in AHCI mode. At first there were 2 drives in ZFS Mirror configuration and everything was fine. After few months I added 2 more drives and upgraded mirror to RAID10. Since then I'm getting random write errors, most likely during heavy write load. Scrub finishes without any error. No SMART errors.

HP Microserver Gen8 gave me various troubles with high capacity drives until I disabled AHCI and switched to IDE/Legacy mode. Give it a try.

QBANIN commented 4 years ago

I have 4 WDC_WD30EFRX installed in HP Microserver Gen8 connected to motherboard sata controller configured in AHCI mode. At first there were 2 drives in ZFS Mirror configuration and everything was fine. After few months I added 2 more drives and upgraded mirror to RAID10. Since then I'm getting random write errors, most likely during heavy write load. Scrub finishes without any error. No SMART errors.

HP Microserver Gen8 gave me various troubles with high capacity drives until I disabled AHCI and switched to IDE/Legacy mode. Give it a try.

What kind of troubles? It worked fine with 2x3TB but throws errors with 4x3TB...

I checked differences between old and new drives but there's nothing that matter... sde=one of old drives sdg=one of new drives

< /dev/sde:
---
> /dev/sdg:
6,7c6,7
<       Serial Number:      WD-WMC4N0505144
<       Firmware Revision:  80.00A80
---
>       Serial Number:      WD-WCC4N4TKRDJR
>       Firmware Revision:  82.00A82
95,96c95,96
<       420min for SECURITY ERASE UNIT. 420min for ENHANCED SECURITY ERASE UNIT.
< Logical Unit WWN Device Identifier: 50014ee003a891af
---
>       416min for SECURITY ERASE UNIT. 416min for ENHANCED SECURITY ERASE UNIT.
> Logical Unit WWN Device Identifier: 50014ee20d6ce16e
99c99
<       Unique ID       : 003a891af
---
>       Unique ID       : 20d6ce16e

shodanshok commented 4 years ago

What kind of troubles? It worked fine with 2x3TB but throws errors with 4x3TB...

BIOS recognition issue and slow/unpredictable performance. After switching to IDE all issues disappeared.

darrenfreeman commented 4 years ago

See https://forums.unraid.net/topic/72428-solved-clearing-synology-drive/

It's possible your EFRX drive has corrupted sectors that can't be read. (Power loss during writing?) Maybe try zeroing the whole drive, and see if that solves it.

QBANIN commented 4 years ago

@darrenfreeman My issue is write not read related. Besides when this happen it affect many/random drives at once. I never experienced power loss. My server is connected to UPS.

darrenfreeman commented 4 years ago

Do you have ashift=12? If it's smaller, then that will incur read-modify-write.

It looks like the same error that we see on the SMR drives, but yours aren't known to be SMR.

Can you try zeroing the whole drive anyway, and see if it throws those errors?

QBANIN commented 4 years ago

@darrenfreeman Yes, all zpools have ashift =12

sudo zdb -C |grep ashift ashift: 12 ashift: 12 ashift: 12 ashift: 12 ashift: 12 ashift: 12

I cant zero any drive, it contains important data. Besides, I think that it's pointless. As I mentioned before this error appears randomly on 1-4 drives in this zpool and random sectores in the same moment. Sometimes its drive 1, another time 1+3, another 1,2,3 or 1,2,4 so this can't be just single drive issue. Untill I expanded this zpool from 2 to 4 drives this error never happend. I think it has something to do with 4 drives working together, maybe SATA controller or RAID10 data distribution issue...

Stoatwblr commented 4 years ago

This has all the hallmarks of a power supply problem - especially as it only happened once you added more drives

Replace your PSU and see if the problems continue. Pay close attention to the wattage of the unit.

QBANIN commented 4 years ago

@Stoatwblr I don't think so. Total number of drives hasn't changed, just configuration. It was

Single, Single, 2 drives in RAID1 - total 4 drives

now it's

4 drives in RAID 10

I have 4 more drives (3x SDD + 1xHDD) connected to different SATA controller working without any problem since the beginning.

Putting everything together: I have 8 drives: 4 connected to motherboard SATA controller with described issue and another 4 connected to LSI HBA controller working flawlessly.

Stoatwblr commented 4 years ago

Replace the PSU anyway - It is the common factor

darrenfreeman commented 4 years ago

It could be a SATA issue.

Some SATA controllers are notorious for problems. One bad drive can throw errors on other drives, and it can throw other good drives out of your array. (I've had this happen.)

You could try moving one drive out of each mirror onto the LSI HBA, just to see if those drives stop having issues.

QBANIN commented 4 years ago

@darrenfreeman I'll do that but I have to buy new 8i LSI controller coz the one I have in my server is 4i only with all ports in use.

shodanshok commented 4 years ago

@QBANIN again, before buying a new HBA: try setting your SATA port to IDE mode (rather than AHCI).

QBANIN commented 4 years ago

@shodanshok Will do. Before switching to IDE I disabled Write Cache (in bios) for built-in controller. I'm runinng bonnie++ HDD stress test since few hours and so far no error appeared. Is it possible that this issue is somehow related to buggy Write Cache handling by ZFS?

BTW: I don't see any downsides of disabling Write Cache. Sequential write speed to this RAID10 zpool is >200MB/s . Can't test random write speed but is not as much important because this is typical storage pool for long term data uploaded via single FTP/SMB connection or backup software.

shodanshok commented 4 years ago

@QBANIN the disk own writeback cache can be useful when issuing random writes without invoking fsync - which can happen quite a lot. The typical ZFS write pattern can lower the impact from disabling this cache, but I would avoid disabling it if possible.

ZFS has excellent support for disk caching - so no, it is really difficult (if not impossible) than it is ZFS fault. Theoretically, disabling disk caching can void problem only if the DRAM cache itself is broken, which is a very remot possibility if happening on multiple disks at the same time. But as suggested by @Stoatwblr it can be the power supply having problem and corrupting in-cache data (I saw something similar with an assembled NAS).

As suggested, re-enable the writeback cache and try to:

set SATA to IDE
change power supply (or to use another power rail at the very least)

QBANIN commented 4 years ago

@shodanshok I can't replace power supply but will test IDE legacy mode.

darrenfreeman commented 4 years ago

I was suggesting that you swap a couple of drives between the LSI HBA and the SATA controller. Same number of ports, but see if the problem follows the drive, or the controller.

QBANIN commented 4 years ago

@QBANIN the disk own writeback cache can be useful when issuing random writes without invoking fsync - which can happen quite a lot. The typical ZFS write pattern can lower the impact from disabling this cache, but I would avoid disabling it if possible.

ZFS has excellent support for disk caching - so no, it is really difficult (if not impossible) than it is ZFS fault. Theoretically, disabling disk caching can void problem only if the DRAM cache itself is broken, which is a very remot possibility if happening on multiple disks at the same time. But as suggested by @Stoatwblr it can be the power supply having problem and corrupting in-cache data (I saw something similar with an assembled NAS).

As suggested, re-enable the writeback cache and try to:
* set SATA to IDE

* change power supply (or to use another power rail at the very least)

You were right about IDE mode. The issue is gone after switching from AHCI to IDE. :D So far 9 days without any write error.

shodanshok commented 4 years ago

@QBANIN glad to hear that ;) The AHCI mode of these small HP Microservers Gen8 seems really problematic...

QBANIN commented 4 years ago

@shodanshok That's why I'm going to switch to some 8i LSI controller anyway :)

yohgaki commented 4 years ago

Since I had read errors on occasions with good HDDs, I replaced power supply and my problem is gone. Even if I didn't have any other issues but ZFS read failure, it seems the cause was faulty power supply.

QBANIN commented 3 years ago

It happened again few days after my last comment. Switched to LSI 9207-8i and... it happened again. It looks like this issue is affecting only WD RED 3,5" (CMR) drives from 1/4 up to 4/4 at once (4 this time). Remaining 2x HDD (WD blue 2,5" and Seagate 2,5") and 3x SSD drives work fine.

[czw sie 6 01:56:31 2020] mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [czw sie 6 01:56:31 2020] sd 0:0:7:0: [sdh] tag#4645 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [czw sie 6 01:56:31 2020] sd 0:0:7:0: [sdh] tag#4646 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [czw sie 6 01:56:31 2020] sd 0:0:7:0: [sdh] tag#4645 Sense Key : Illegal Request [current] [descriptor] [czw sie 6 01:56:31 2020] sd 0:0:7:0: [sdh] tag#4645 Add. Sense: Logical block address out of range [czw sie 6 01:56:31 2020] sd 0:0:7:0: [sdh] tag#4645 CDB: Write(16) 8a 00 00 00 00 01 2c c1 14 28 00 00 08 00 00 00 [czw sie 6 01:56:31 2020] blk_update_request: critical target error, dev sdh, sector 5045818408 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0 [czw sie 6 01:56:31 2020] sd 0:0:7:0: [sdh] tag#4646 CDB: Write(16) 8a 00 00 00 00 01 2c c1 1c 28 00 00 04 00 00 00 [czw sie 6 01:56:31 2020] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0505144-part1 error=121 type=2 offset=2583457976320 size=1048576 flags=40080c80 [czw sie 6 01:56:31 2020] blk_update_request: I/O error, dev sdh, sector 5045820456 op 0x1:(WRITE) flags 0x700 phys_seg 8 prio class 0 [czw sie 6 01:56:31 2020] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0505144-part1 error=5 type=2 offset=2583459024896 size=524288 flags=40080c80 [czw sie 6 01:56:31 2020] mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [czw sie 6 01:56:31 2020] sd 0:0:4:0: [sde] tag#4653 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [czw sie 6 01:56:31 2020] sd 0:0:4:0: [sde] tag#4653 CDB: Write(16) 8a 00 00 00 00 01 0a dc 9c 28 00 00 08 00 00 00 [czw sie 6 01:56:31 2020] blk_update_request: I/O error, dev sde, sector 4477197352 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0 [czw sie 6 01:56:31 2020] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6EVNSY9-part1 error=5 type=2 offset=2292323995648 size=1048576 flags=40080c80 [czw sie 6 01:56:31 2020] sd 0:0:4:0: [sde] tag#4652 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [czw sie 6 01:56:31 2020] sd 0:0:4:0: [sde] tag#4652 Sense Key : Illegal Request [current] [descriptor] [czw sie 6 01:56:31 2020] sd 0:0:4:0: [sde] tag#4652 Add. Sense: Logical block address out of range [czw sie 6 01:56:31 2020] sd 0:0:4:0: [sde] tag#4652 CDB: Write(16) 8a 00 00 00 00 01 0a dc 94 28 00 00 08 00 00 00 [czw sie 6 01:56:31 2020] blk_update_request: critical target error, dev sde, sector 4477195304 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0 [czw sie 6 01:56:31 2020] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6EVNSY9-part1 error=121 type=2 offset=2292322947072 size=1048576 flags=40080c80 [czw sie 6 01:56:32 2020] mpt2sas_cm0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436) [czw sie 6 01:56:32 2020] mpt2sas_cm0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436) [czw sie 6 01:56:32 2020] sd 0:0:5:0: [sdf] tag#4649 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [czw sie 6 01:56:32 2020] sd 0:0:5:0: [sdf] tag#4649 CDB: Write(16) 8a 00 00 00 00 01 0a dc 8d 28 00 00 08 00 00 00 [czw sie 6 01:56:32 2020] blk_update_request: I/O error, dev sdf, sector 4477193512 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0 [czw sie 6 01:56:32 2020] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4TKRDJR-part1 error=5 type=2 offset=2292322029568 size=1048576 flags=40080c80 [czw sie 6 01:56:32 2020] sd 0:0:5:0: [sdf] tag#4669 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [czw sie 6 01:56:32 2020] sd 0:0:5:0: [sdf] tag#4669 CDB: Write(16) 8a 00 00 00 00 01 0a dc 95 28 00 00 08 00 00 00 [czw sie 6 01:56:32 2020] blk_update_request: I/O error, dev sdf, sector 4477195560 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0 [czw sie 6 01:56:32 2020] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4TKRDJR-part1 error=5 type=2 offset=2292323078144 size=1048576 flags=40080c80 [czw sie 6 01:56:32 2020] sd 0:0:5:0: Power-on or device reset occurred [czw sie 6 01:56:32 2020] mpt2sas_cm0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436) [czw sie 6 01:56:32 2020] mpt2sas_cm0: log_info(0x31120436): originator(PL), code(0x12), sub_code(0x0436) [czw sie 6 01:56:32 2020] sd 0:0:6:0: [sdg] tag#4840 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [czw sie 6 01:56:32 2020] sd 0:0:6:0: [sdg] tag#4644 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [czw sie 6 01:56:32 2020] sd 0:0:6:0: [sdg] tag#4840 CDB: Write(16) 8a 00 00 00 00 01 2c c1 0d 28 00 00 06 00 00 00 [czw sie 6 01:56:32 2020] sd 0:0:6:0: [sdg] tag#4644 CDB: Write(16) 8a 00 00 00 00 01 2c c1 05 28 00 00 08 00 00 00 [czw sie 6 01:56:32 2020] blk_update_request: I/O error, dev sdg, sector 5045816616 op 0x1:(WRITE) flags 0x700 phys_seg 12 prio class 0 [czw sie 6 01:56:32 2020] blk_update_request: I/O error, dev sdg, sector 5045814568 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0 [czw sie 6 01:56:32 2020] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0371995-part1 error=5 type=2 offset=2583457058816 size=786432 flags=40080c80 [czw sie 6 01:56:32 2020] zio pool=RAID1 vdev=/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0371995-part1 error=5 type=2 offset=2583456010240 size=1048576 flags=40080c80 [czw sie 6 01:56:32 2020] sd 0:0:6:0: Power-on or device reset occurred [czw sie 6 01:56:32 2020] sd 0:0:5:0: Power-on or device reset occurred [czw sie 6 01:56:33 2020] sd 0:0:6:0: Power-on or device reset occurred

pool: RAID1 state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: resilvered 54,8M in 0 days 00:00:04 with 0 errors on Thu Aug 6 01:56:45 2020 config:

    NAME                                                    STATE     READ WRITE CKSUM
    RAID1                                                   DEGRADED     0     0     0
      mirror-0                                              DEGRADED     0     9     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4TKRDJR            DEGRADED     0    18     3  too many errors
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6EVNSY9            FAULTED      0    18     0  too many errors
      mirror-2                                              DEGRADED     0     0     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0505144            FAULTED      0    14     0  too many errors
        ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0371995            DEGRADED     0    16    13  too many errors
    logs
      ata-Samsung_SSD_840_PRO_Series_S12PNEAD414919H-part4  ONLINE       0     0     0
    cache
      sdc5                                                  ONLINE       0     0     0

After zpool clear RAID1 (in fact it's raid 10) everything works like before. Scrub show no issue.

behlendorf commented 3 years ago

Closing. From the discussion it sounds like issues reported here were caused by hardware.

openzfs / zfs