openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.33k stars 1.72k forks source link

ZFS not detecting failing drive ? ( blk_update_request: critical target error ) #7877

Closed dirkpetersen closed 5 years ago

dirkpetersen commented 5 years ago

System information

Distribution Name | Ubuntu Distribution Version | 16.04 Linux Kernel | 4.4.0-104-generic Architecture | x64 ZFS Version | 0.6.5.6-0ubuntu4 SPL Version | 0.6.5.6-0ubuntu4

We have seen high service times of one disk for more than 1 week.

image

for several days we have seen this in the logs:

/var/log/syslog.1:Sep  7 06:25:20 chromium-store3 kernel: [14413531.517195] blk_update_request: critical target error, dev sdf, sector 2277
/var/log/syslog.1:Sep  7 06:25:23 chromium-store3 kernel: [14413534.832919] sd 0:0:5:0: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
/var/log/syslog.1:Sep  7 06:25:23 chromium-store3 kernel: [14413534.832943] sd 0:0:5:0: [sdf] tag#0 Sense Key : Hardware Error [current] 
/var/log/syslog.1:Sep  7 06:25:23 chromium-store3 kernel: [14413534.832951] sd 0:0:5:0: [sdf] tag#0 Add. Sense: No defect spare location available
/var/log/syslog.1:Sep  7 06:25:23 chromium-store3 kernel: [14413534.832958] sd 0:0:5:0: [sdf] tag#0 CDB: Read(10) 28 00 00 00 08 20 00 00 e0 00
/var/log/syslog.1:Sep  7 06:25:23 chromium-store3 kernel: [14413534.832964] blk_update_request: critical target error, dev sdf, sector 2277
/var/log/syslog.1:Sep  7 06:25:27 chromium-store3 kernel: [14413538.207042] sd 0:0:5:0: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
/var/log/syslog.1:Sep  7 06:25:27 chromium-store3 kernel: [14413538.207072] sd 0:0:5:0: [sdf] tag#0 Sense Key : Hardware Error [current] 
/var/log/syslog.1:Sep  7 06:25:27 chromium-store3 kernel: [14413538.207080] sd 0:0:5:0: [sdf] tag#0 Add. Sense: No defect spare location available
/var/log/syslog.1:Sep  7 06:25:27 chromium-store3 kernel: [14413538.207088] sd 0:0:5:0: [sdf] tag#0 CDB: Read(10) 28 00 00 00 08 20 00 00 e0 00
/var/log/syslog.1:Sep  7 06:25:27 chromium-store3 kernel: [14413538.207094] blk_update_request: critical target error, dev sdf, sector 2277

however zpool status does not indicate any degradation or other problem. Is this not considered a drive failure?

We did now fail the drive manually and the system is currently resilvering.

rincebrain commented 5 years ago

A) you didn't include zpool status or possibly zpool events (though the latter will probably not be useful if you have any sort of regular snapshotting)

B) This is a bug tracker, not a support forum, you probably would get more useful responses from IRC or the various ZFS mailing lists

C) The version you're running is 30 months old, it'd probably be informative to test a newer version (0.7.10 just came out) to see if this is still a problem.

dirkpetersen commented 5 years ago

Thanks much

A) zpool status does not show much useful info and since we failed the drive manually it shows now degraded, pasted below, zpools events output starts exactly at the time when we failed the drive manually

B) I was not looking for much support as I was wondering if ZFS is not sensitive enough when detecting failed drives (There are perhaps multiple opinions on this topic)

C) Yes, it is the version that comes with Ubuntu LTS release. Unfortunately I cannot upgrade this production system right now

root@chromium-store3:~# zpool events 
TIME                           CLASS
Sep  7 2018 17:09:47.627344116 ereport.fs.zfs.io
Sep  7 2018 17:09:51.219363250 ereport.fs.zfs.io
Sep  7 2018 17:09:54.559381041 ereport.fs.zfs.io
Sep  7 2018 17:09:58.067399725 ereport.fs.zfs.io
Sep  7 2018 17:10:01.615418620 ereport.fs.zfs.io
Sep  7 2018 17:10:05.023436769 ereport.fs.zfs.io
Sep  7 2018 17:10:08.587455751 ereport.fs.zfs.io
Sep  7 2018 17:10:11.739472536 ereport.fs.zfs.io
Sep  7 2018 17:10:15.079490323 ereport.fs.zfs.io
Sep  7 2018 17:10:18.575508939 ereport.fs.zfs.io
Sep  7 2018 17:10:22.271528620 ereport.fs.zfs.io
Sep  7 2018 17:10:25.875547810 ereport.fs.zfs.io
Sep  7 2018 17:10:29.355566339 ereport.fs.zfs.io
Sep  7 2018 17:10:32.979585634 ereport.fs.zfs.io
Sep  7 2018 17:10:36.347603566 ereport.fs.zfs.io
Sep  7 2018 17:10:40.307624650 ereport.fs.zfs.io
Sep  7 2018 17:10:43.979644196 ereport.fs.zfs.io
Sep  7 2018 17:10:47.555663233 ereport.fs.zfs.io
Sep  7 2018 17:10:51.095682078 ereport.fs.zfs.io
Sep  7 2018 17:10:54.611700795 ereport.fs.zfs.io
Sep  7 2018 17:10:58.231720064 ereport.fs.zfs.io
Sep  7 2018 17:11:01.871739438 ereport.fs.zfs.io
Sep  7 2018 17:11:05.267757513 ereport.fs.zfs.io
Sep  7 2018 17:11:08.839776525 ereport.fs.zfs.io
Sep  7 2018 17:11:12.579796429 ereport.fs.zfs.io
Sep  7 2018 17:11:16.295816208 ereport.fs.zfs.io
Sep  7 2018 17:11:19.927835540 ereport.fs.zfs.io
Sep  7 2018 17:11:23.535854737 ereport.fs.zfs.io
Sep  7 2018 17:11:27.659876681 ereport.fs.zfs.io
Sep  7 2018 17:11:31.283895965 ereport.fs.zfs.io
Sep  7 2018 17:11:35.099916267 ereport.fs.zfs.io
Sep  7 2018 17:11:38.655935189 ereport.fs.zfs.io
Sep  7 2018 17:11:42.287954513 ereport.fs.zfs.io
Sep  7 2018 17:11:46.139975009 ereport.fs.zfs.io
Sep  7 2018 17:11:49.603993437 ereport.fs.zfs.io
Sep  7 2018 17:11:53.160012356 ereport.fs.zfs.io
Sep  7 2018 17:11:56.836031911 ereport.fs.zfs.io
Sep  7 2018 17:12:00.416050956 ereport.fs.zfs.io
Sep  7 2018 17:12:04.400072151 ereport.fs.zfs.io
Sep  7 2018 17:12:08.008091340 ereport.fs.zfs.io
Sep  7 2018 17:12:11.700110977 ereport.fs.zfs.io
Sep  7 2018 17:12:15.144129294 ereport.fs.zfs.io
Sep  7 2018 17:12:18.660147997 ereport.fs.zfs.io
Sep  7 2018 17:12:22.620169060 ereport.fs.zfs.io
Sep  7 2018 17:12:26.336188819 ereport.fs.zfs.io
Sep  7 2018 17:12:30.320210008 ereport.fs.zfs.io
Sep  7 2018 17:12:33.816228597 ereport.fs.zfs.io
Sep  7 2018 17:12:37.392247616 ereport.fs.zfs.io
Sep  7 2018 17:12:41.408268969 ereport.fs.zfs.io
Sep  7 2018 17:12:44.924287665 ereport.fs.zfs.io
Sep  7 2018 17:12:48.496306658 ereport.fs.zfs.io
Sep  7 2018 17:12:52.284326798 ereport.fs.zfs.io
Sep  7 2018 17:12:56.020346662 ereport.fs.zfs.io
Sep  7 2018 17:12:59.940367503 ereport.fs.zfs.io
Sep  7 2018 17:13:03.508386471 ereport.fs.zfs.io
Sep  7 2018 17:13:07.116405651 ereport.fs.zfs.io
Sep  7 2018 17:13:10.804425257 ereport.fs.zfs.io
Sep  7 2018 17:13:14.724446093 ereport.fs.zfs.io
Sep  7 2018 17:13:18.268464931 ereport.fs.zfs.io
Sep  7 2018 17:13:21.868484068 ereport.fs.zfs.io
Sep  7 2018 17:13:25.588503840 ereport.fs.zfs.io
Sep  7 2018 17:13:29.252523315 ereport.fs.zfs.io
Sep  7 2018 17:13:32.800542171 ereport.fs.zfs.io
Sep  7 2018 17:13:36.404561327 ereport.fs.zfs.io
Sep  7 2018 17:13:40.304582051 ereport.fs.zfs.io
Sep  7 2018 17:13:44.096602204 ereport.fs.zfs.io
Sep  7 2018 17:13:47.528620442 ereport.fs.zfs.io
Sep  7 2018 17:13:51.112639485 ereport.fs.zfs.io
Sep  7 2018 17:13:54.480657382 ereport.fs.zfs.io
Sep  7 2018 17:13:57.856675320 ereport.fs.zfs.io
Sep  7 2018 17:14:01.216693173 ereport.fs.zfs.io
Sep  7 2018 17:14:04.444710324 ereport.fs.zfs.io
Sep  7 2018 17:14:07.708727667 ereport.fs.zfs.io
Sep  7 2018 17:14:11.216746304 ereport.fs.zfs.io
Sep  7 2018 17:14:14.572764132 ereport.fs.zfs.io
Sep  7 2018 17:14:17.932781983 ereport.fs.zfs.io
Sep  7 2018 17:14:21.312799938 ereport.fs.zfs.io
Sep  7 2018 17:14:24.628817555 ereport.fs.zfs.io
Sep  7 2018 17:14:27.876834810 ereport.fs.zfs.io
Sep  7 2018 17:14:31.092851891 ereport.fs.zfs.io
Sep  7 2018 17:14:34.392869421 ereport.fs.zfs.io
Sep  7 2018 17:14:37.696886968 ereport.fs.zfs.io
Sep  7 2018 17:14:41.084904963 ereport.fs.zfs.io
Sep  7 2018 17:14:44.516923190 ereport.fs.zfs.io
Sep  7 2018 17:14:47.852940909 ereport.fs.zfs.io
Sep  7 2018 17:14:51.072958008 ereport.fs.zfs.io
Sep  7 2018 17:14:54.392975643 ereport.fs.zfs.io
Sep  7 2018 17:14:57.576992547 ereport.fs.zfs.io
Sep  7 2018 17:15:00.969010561 ereport.fs.zfs.io
Sep  7 2018 17:15:04.457029082 ereport.fs.zfs.io
Sep  7 2018 17:15:07.853047118 ereport.fs.zfs.io
Sep  7 2018 17:15:11.233065065 ereport.fs.zfs.io
Sep  7 2018 17:15:14.633083116 ereport.fs.zfs.io
Sep  7 2018 17:15:17.997100977 ereport.fs.zfs.io
Sep  7 2018 17:15:21.461119369 ereport.fs.zfs.io
Sep  7 2018 17:15:24.965137971 ereport.fs.zfs.io
Sep  7 2018 17:15:28.377156086 ereport.fs.zfs.io
Sep  7 2018 17:15:31.889174729 ereport.fs.zfs.io
Sep  7 2018 17:15:35.229192461 ereport.fs.zfs.io
Sep  7 2018 17:15:38.601210361 ereport.fs.zfs.io
Sep  7 2018 17:15:42.109228983 ereport.fs.zfs.io
Sep  7 2018 17:15:45.433246627 ereport.fs.zfs.io
Sep  7 2018 17:15:48.773264355 ereport.fs.zfs.io
Sep  7 2018 17:15:52.513284206 ereport.fs.zfs.io
Sep  7 2018 17:15:56.137303441 ereport.fs.zfs.io
Sep  7 2018 17:15:59.453321040 ereport.fs.zfs.io
Sep  7 2018 17:16:02.837339001 ereport.fs.zfs.io
Sep  7 2018 17:16:06.085356237 ereport.fs.zfs.io
Sep  7 2018 17:16:09.317373389 ereport.fs.zfs.io
Sep  7 2018 17:16:12.717391432 ereport.fs.zfs.io
Sep  7 2018 17:16:16.065409197 ereport.fs.zfs.io
Sep  7 2018 17:16:19.329426518 ereport.fs.zfs.io
Sep  7 2018 17:16:22.605443901 ereport.fs.zfs.io
Sep  7 2018 17:16:25.905461410 ereport.fs.zfs.io
Sep  7 2018 17:16:29.345479664 ereport.fs.zfs.io
Sep  7 2018 17:16:32.585496855 ereport.fs.zfs.io
Sep  7 2018 17:16:36.017515063 ereport.fs.zfs.io
Sep  7 2018 17:16:39.341532700 ereport.fs.zfs.io
Sep  7 2018 17:16:42.841551269 ereport.fs.zfs.io
Sep  7 2018 17:16:46.433570323 ereport.fs.zfs.io
Sep  7 2018 17:16:49.713587723 ereport.fs.zfs.io
Sep  7 2018 17:16:52.929604784 ereport.fs.zfs.io
Sep  7 2018 17:16:56.345622903 ereport.fs.zfs.io
Sep  7 2018 17:16:59.669640535 ereport.fs.zfs.io
Sep  7 2018 17:17:03.145658973 ereport.fs.zfs.io
Sep  7 2018 17:17:06.561677092 ereport.fs.zfs.io
Sep  7 2018 17:17:09.893694767 ereport.fs.zfs.io
Sep  7 2018 17:17:13.449713627 ereport.fs.zfs.io
Sep  7 2018 17:17:17.105733014 ereport.fs.zfs.io
Sep  7 2018 17:17:21.365755607 ereport.fs.zfs.io
Sep  7 2018 17:17:27.229786704 ereport.fs.zfs.io
Sep  7 2018 17:17:32.285813515 ereport.fs.zfs.io
Sep  7 2018 17:17:37.241839796 ereport.fs.zfs.io
Sep  7 2018 17:17:41.209860836 ereport.fs.zfs.io
Sep  7 2018 17:17:45.949885970 ereport.fs.zfs.io
Sep  7 2018 17:17:49.765906201 ereport.fs.zfs.io
Sep  7 2018 17:17:53.613926602 ereport.fs.zfs.io
Sep  7 2018 17:17:57.261945942 ereport.fs.zfs.io
Sep  7 2018 17:18:01.045966004 ereport.fs.zfs.io
Sep  7 2018 17:18:05.909991789 ereport.fs.zfs.io
Sep  7 2018 17:18:09.726012016 ereport.fs.zfs.io
Sep  7 2018 17:18:13.398031483 ereport.fs.zfs.io
Sep  7 2018 17:18:17.218051731 ereport.fs.zfs.io
Sep  7 2018 17:18:20.938071447 ereport.fs.zfs.io
Sep  7 2018 17:18:24.566090678 ereport.fs.zfs.io
Sep  7 2018 17:18:28.314110544 ereport.fs.zfs.io
Sep  7 2018 17:18:32.242131363 ereport.fs.zfs.io
Sep  7 2018 17:18:36.154152095 ereport.fs.zfs.io
Sep  7 2018 17:18:40.070172847 ereport.fs.zfs.io
Sep  7 2018 17:18:44.866198264 ereport.fs.zfs.io
Sep  7 2018 17:18:48.482217426 ereport.fs.zfs.io
Sep  7 2018 17:18:52.198237117 ereport.fs.zfs.io
Sep  7 2018 17:18:56.022257378 ereport.fs.zfs.io
Sep  7 2018 17:18:59.746277113 ereport.fs.zfs.io
Sep  7 2018 17:19:03.870298961 ereport.fs.zfs.io
Sep  7 2018 17:19:08.834325262 ereport.fs.zfs.io
Sep  7 2018 17:19:12.986347260 ereport.fs.zfs.io
Sep  7 2018 17:19:18.182374786 ereport.fs.zfs.io
Sep  7 2018 17:19:23.066400660 ereport.fs.zfs.io
Sep  7 2018 17:19:26.890420918 ereport.fs.zfs.io
Sep  7 2018 17:19:31.162443547 ereport.fs.zfs.io
Sep  7 2018 17:19:36.410471346 ereport.fs.zfs.io
Sep  7 2018 17:19:41.526498444 ereport.fs.zfs.io
Sep  7 2018 17:19:46.498524777 ereport.fs.zfs.io
Sep  7 2018 17:19:51.750552592 ereport.fs.zfs.io
Sep  7 2018 17:19:56.998580387 ereport.fs.zfs.io
Sep  7 2018 17:20:02.026607012 ereport.fs.zfs.io
Sep  7 2018 17:20:05.770626841 ereport.fs.zfs.io
Sep  7 2018 17:20:09.478646474 ereport.fs.zfs.io
Sep  7 2018 17:20:13.534667953 ereport.fs.zfs.io
Sep  7 2018 17:20:18.418693814 ereport.fs.zfs.io
Sep  7 2018 17:20:22.162713636 ereport.fs.zfs.io
Sep  7 2018 17:20:25.970733797 ereport.fs.zfs.io
Sep  7 2018 17:20:29.442752180 ereport.fs.zfs.io
Sep  7 2018 17:20:33.202772087 ereport.fs.zfs.io
Sep  7 2018 17:20:36.782791039 ereport.fs.zfs.io
Sep  7 2018 17:20:40.670811624 ereport.fs.zfs.io
Sep  7 2018 17:20:44.314830912 ereport.fs.zfs.io
Sep  7 2018 17:20:47.890849843 ereport.fs.zfs.io
Sep  7 2018 17:20:51.422868540 ereport.fs.zfs.io
Sep  7 2018 17:20:54.970887321 ereport.fs.zfs.io
Sep  7 2018 17:20:58.510906060 ereport.fs.zfs.io
Sep  7 2018 17:21:02.034924710 ereport.fs.zfs.io
Sep  7 2018 17:21:05.638943788 ereport.fs.zfs.io
Sep  7 2018 17:21:09.358963475 ereport.fs.zfs.io
Sep  7 2018 17:21:13.122983394 ereport.fs.zfs.io
Sep  7 2018 17:21:16.587001727 ereport.fs.zfs.config.sync
Sep  7 2018 17:42:11.713624795 resource.fs.zfs.statechange
Sep  7 2018 17:42:11.721624839 ereport.fs.zfs.vdev.spare
Sep  7 2018 17:42:14.073637519 ereport.fs.zfs.resilver.start
Sep  7 2018 17:42:17.929658306 ereport.fs.zfs.config.sync
Sep  8 2018 15:43:15.219211939 ereport.fs.zfs.checksum

root@chromium-store3:~# zpool status
  pool: chromium_data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Sep  7 17:42:14 2018
    38.5T scanned out of 53.3T at 403M/s, 10h43m to go
    1.16T resilvered, 72.15% done
config:

    NAME                      STATE     READ WRITE CKSUM
    chromium_data             DEGRADED     0     0     0
      raidz2-0                DEGRADED     0     0     0
        sdc                   ONLINE       0     0     0
        sdd                   ONLINE       0     0     0
        sde                   ONLINE       0     0     0
        spare-3               OFFLINE      0     0     0
          sdf                 OFFLINE  9.94K     0     0
          sdaj                ONLINE       0     0     0  (resilvering)
        sdg                   ONLINE       0     0     0
        sdh                   ONLINE       0     0     0
        sdi                   ONLINE       0     0     1  (resilvering)
        sdj                   ONLINE       0     0     0
        sdk                   ONLINE       0     0     0
        sdl                   ONLINE       0     0     0
        sdm                   ONLINE       0     0     0
      raidz2-1                ONLINE       0     0     0
        sdn                   ONLINE       0     0     0
        sdo                   ONLINE       0     0     0
        sdp                   ONLINE       0     0     0
        sdq                   ONLINE       0     0     0
        sdr                   ONLINE       0     0     0
        sds                   ONLINE       0     0     0
        sdt                   ONLINE       0     0     0
        sdu                   ONLINE       0     0     0
        sdv                   ONLINE       0     0     0
        sdw                   ONLINE       0     0     0
        sdx                   ONLINE       0     0     0
      raidz2-2                ONLINE       0     0     0
        sdy                   ONLINE       0     0     0
        sdz                   ONLINE       0     0     0
        sdaa                  ONLINE       0     0     0
        sdab                  ONLINE       0     0     0
        sdac                  ONLINE       0     0     0
        sdad                  ONLINE       0     0     0
        sdae                  ONLINE       0     0     0
        sdaf                  ONLINE       0     0     0
        sdag                  ONLINE       0     0     0
        sdah                  ONLINE       0     0     0
        sdai                  ONLINE       3     0     0
    logs
      mirror-3                ONLINE       0     0     0
        sdak                  ONLINE       0     0     0
        sdal                  ONLINE       0     0     0
    cache
      sda                     ONLINE       0     0     0
    spares
      wwn-0x5000c50057ce71a3  INUSE     currently in use

errors: No known data errors
rincebrain commented 5 years ago

A) Depending on the nature of the error, that could have been correct. You could have changed failmode=wait to failmode=continue and seen if it reported errors and kicked the drive versus waiting to see if the drive got over it. (That particular drive error looks a lot like it's just complaining that it couldn't reallocate the problematic sector because it's out of spares, but it shouldn't try to reallocate unless it successfully reads the sector or you overwrote it, and it might succeed on read/write (if not realloc) on retry.)

B) If ZFS is still responding on failmode=wait even when zpool events is noticing IO errors, it probably means that on a successive try it succeeded. (Or there's a bug in its handling, but I don't see any obvious fixes for that in 0.6.5.X.) A couple of the zpool events -v messages would probably be sufficient to figure out whether it thought these were fatal or just something to retry.

(The historical case I've seen a bunch is having pool IO block because it was waiting too long for a drive that didn't error, just...took ~forever, so the fact that it's still responding makes me think it just decided the errors weren't fatal or succeeded on a second attempt every time.)

bunder2015 commented 5 years ago

For the sake of completeness (and mostly because I'm curious), could we see smartctl -A /dev/sdf, and which model of drive is it?

h1z1 commented 5 years ago

It's been reported before, #6885

dirkpetersen commented 5 years ago

Thanks for the suggestions, for completeness, here is lshw and smartctl output, its a 2TB seagate drive in one of 8 36-drive supermicro boxes that have been in service under heavy load for about 4 years serving an HPC scratch file system, the system is typically close to 80% sometimes 90% full

not sure if #6885 is the same as it seems there have been no drive errors reported by the kernel in that issue.

*-disk:31
       description: SCSI Disk
       product: ST2000NM0023
       vendor: SEAGATE
       physical id: 0.5.0
       bus info: scsi@0:0.5.0
       logical name: /dev/sdf
       version: 0002
       serial: Z1X194SC0000C4143BY9
       size: 1863GiB (2TB)
       capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=6 guid=162580a0-9578-7c44-881a-8f5356dd0d05 logicalsectorsize=512 sectorsize=512

root@chromium-store3:~# smartctl -A /dev/sdf
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-104-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     37 C
Drive Trip Temperature:        68 C

Manufactured in week 43 of year 2013
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  801
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  2447
Elements in grown defect list: 16343

Vendor (Seagate) cache information
  Blocks sent to initiator = 1749331245
  Blocks received from initiator = 165667002
  Blocks read from cache and sent to initiator = 412177445
  Number of read and write commands whose size <= segment size = 619316483
  Number of read and write commands whose size > segment size = 45250

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 39427.67
  number of minutes until next internal SMART test = 25
richardelling commented 5 years ago

It seems the grown defect list is full. It is a good idea to replace the disk. In general, you can track the rate of increase in the grown defect list as part of preventative maintenance. Also, you should keep an eye on the performance of disks, because they can become very slow as the grown defects list fills.

Lastly, this is a bug tracker and your problem is not a bug. Folks who run datacenters directly monitor disk wear and this is not a file system function. You’ll have better luck with the email list.

h1z1 commented 5 years ago

@dirkpetersen There were no errors but the performance drop of the drives should have triggered a fault in zfs. You can script the output of iostat/status, one of which is smartvalues. It may require a newer version.

ZPOOL_SCRIPTS_PATH=/fas250/git/0.7.5/cmd/zpool/zpool.d
ZPOOL_SCRIPTS_AS_ROOT=1 
zpool status  -c smartx
dirkpetersen commented 5 years ago

Awesome , thanks all. I learned a lot.