openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.33k stars 1.72k forks source link

Random checksum errors on raidz1 and mirror pools -- It's not a hardware fault #5018

Closed JuliaVixen closed 5 years ago

JuliaVixen commented 7 years ago

I was backing up a pool (with zfs send|recv), and it happened again. Two checksum errors on a raidz1 pool, with zero checksum errors on any device. Against my own paranoid judgement, I imported this pool read-write, rather than read-only like I usually do, so i can't reboot, swap the drives around, reimport it, and see if it faults on the same file at the same offset... but, I did do that last time, twice, and the fault was always in the exact same spot. I'm using ECC memory and I just tested it for several hours a few days ago. There are no error messages in Linux dmesg, no SMART reports, everything claims to be functioning without error. (And these are not the crappy Seagate drives.)

So... How did this checksum error happen?

About month ago, I copied most of the files off this pool with a regular cp -avi copy, rather than a zfs send|recv, there was no checksum error at that time. The disks have been mostly powered off since then, and I only plugged them back in today to make sure I really got all the data off them.

So, what do I need to do with zdb to debug this, and is anyone else noticing the same thing?

I have about 50T of corrupt zpools now, which I need to "restore the entire pool from backup" because of checksums errors on the top level vdev, when the actual devices themselves are fine, and there's at least an extra disk of parity! How does this happen?

I think this has only started happening this month, so maybe there was a Github commit in July that did something?

localhost ~ # zpool status -v n
  pool: n
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: none requested
config:

    NAME                                   STATE     READ WRITE CKSUM
    n                                      ONLINE       0     0     1
      raidz1-0                             ONLINE       0     0     2
        ata-WDC_WD80EFZX-68UW8N0_VKGNH8BX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHJK9ZX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHNJWBX  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        n@Aug_23_2016:/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img
localhost ~ # zpool history n
History for 'n':
2016-05-19.08:41:14 zpool create -O atime=off -o ashift=12 -o feature@lz4_compress=enabled -o feature@embedded_data=enabled n -f raidz1 /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKGNH8BX /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKHJK9ZX /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKHNJWBX
2016-05-19.13:51:10 zfs recv -evF n
2016-05-19.23:59:16 zfs recv -evF n
2016-05-20.00:12:53 zfs recv -evF n
2016-05-20.03:22:21 zpool export n
2016-06-10.06:27:22 zpool import n
2016-06-10.06:41:15 zfs create n/Seagate_750G_Photos_Sep_2007
2016-06-12.08:10:03 zfs create n/Seagate_750G_copied_Mar_2009
2016-06-13.04:13:54 zfs create n/backedup_SSD
2016-06-14.01:24:08 zfs create n/NEFs
2016-06-24.03:54:36 zpool export n
2016-07-17.01:16:48 zpool import n
2016-07-17.05:00:20 zpool export n
2016-08-03.10:37:48 zpool import n
2016-08-03.10:38:29 zfs destroy n/backedup_SSD
2016-08-03.10:40:03 zfs snapshot n/Seagate_750G_Photos_Sep_2007@send
2016-08-03.10:40:20 zfs snapshot n/Seagate_750G_copied_Mar_2009@send
2016-08-03.10:41:04 zpool export n
2016-08-11.06:48:17 zpool import n
2016-08-11.20:00:57 zpool export n
2016-08-24.05:14:46 zpool import n
2016-08-24.05:15:45 zfs snapshot -r n@Aug_23_2016
localhost ~ # zfs get all n  
NAME  PROPERTY              VALUE                  SOURCE
n     type                  filesystem             -
n     creation              Thu May 19  8:41 2016  -
n     used                  14.0T                  -
n     available             82.1G                  -
n     referenced            5.63T                  -
n     compressratio         1.00x                  -
n     mounted               yes                    -
n     quota                 none                   default
n     reservation           none                   default
n     recordsize            128K                   default
n     mountpoint            /n                     default
n     sharenfs              off                    default
n     checksum              on                     default
n     compression           off                    default
n     atime                 off                    local
n     devices               on                     default
n     exec                  on                     default
n     setuid                on                     default
n     readonly              off                    default
n     zoned                 off                    default
n     snapdir               hidden                 default
n     aclinherit            restricted             default
n     canmount              on                     default
n     xattr                 on                     default
n     copies                1                      default
n     version               5                      -
n     utf8only              off                    -
n     normalization         none                   -
n     casesensitivity       sensitive              -
n     vscan                 off                    default
n     nbmand                off                    default
n     sharesmb              off                    default
n     refquota              none                   default
n     refreservation        none                   default
n     primarycache          all                    default
n     secondarycache        all                    default
n     usedbysnapshots       0                      -
n     usedbydataset         5.63T                  -
n     usedbychildren        8.32T                  -
n     usedbyrefreservation  0                      -
n     logbias               latency                default
n     dedup                 off                    default
n     mlslabel              none                   default
n     sync                  standard               default
n     dnodesize             legacy                 default
n     refcompressratio      1.00x                  -
n     written               0                      -
n     logicalused           13.9T                  -
n     logicalreferenced     5.64T                  -
n     filesystem_limit      none                   default
n     snapshot_limit        none                   default
n     filesystem_count      none                   default
n     snapshot_count        none                   default
n     snapdev               hidden                 default
n     acltype               off                    default
n     context               none                   default
n     fscontext             none                   default
n     defcontext            none                   default
n     rootcontext           none                   default
n     relatime              off                    default
n     redundant_metadata    all                    default
n     overlay               off                    default
localhost ~ # zpool get all n
NAME  PROPERTY                    VALUE                       SOURCE
n     size                        21.8T                       -
n     capacity                    96%                         -
n     altroot                     -                           default
n     health                      ONLINE                      -
n     guid                        13306438682543689121        default
n     version                     -                           default
n     bootfs                      -                           default
n     delegation                  on                          default
n     autoreplace                 off                         default
n     cachefile                   -                           default
n     failmode                    wait                        default
n     listsnapshots               off                         default
n     autoexpand                  off                         default
n     dedupditto                  0                           default
n     dedupratio                  1.00x                       -
n     free                        819G                        -
n     allocated                   20.9T                       -
n     readonly                    off                         -
n     ashift                      12                          local
n     comment                     -                           default
n     expandsize                  -                           -
n     freeing                     0                           default
n     fragmentation               56%                         -
n     leaked                      0                           default
n     feature@async_destroy       enabled                     local
n     feature@empty_bpobj         active                      local
n     feature@lz4_compress        active                      local
n     feature@spacemap_histogram  active                      local
n     feature@enabled_txg         active                      local
n     feature@hole_birth          active                      local
n     feature@extensible_dataset  enabled                     local
n     feature@embedded_data       active                      local
n     feature@bookmarks           enabled                     local
n     feature@filesystem_limits   enabled                     local
n     feature@large_blocks        enabled                     local
n     feature@large_dnode         disabled                    local

Here's all the SMART stuff for these drives...

localhost ~ # smartctl -x /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKGNH8BX
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.6-gentoo-debug2] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EFZX-68UW8N0
Serial Number:    VKGNH8BX
LU WWN Device Id: 5 000cca 254c950be
Firmware Version: 83.H0A83
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Aug 24 08:30:43 2016 Local time zone must be set--see zic m
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     164 (intermediate level without standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  101) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (1133) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   131   131   054    -    116
  3 Spin_Up_Time            POS---   147   147   024    -    447 (Average 446)
  4 Start_Stop_Count        -O--C-   100   100   000    -    15
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    18
  9 Power_On_Hours          -O--C-   100   100   000    -    457
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    15
 22 Unknown_Attribute       PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   100   100   000    -    66
193 Load_Cycle_Count        -O--C-   100   100   000    -    66
194 Temperature_Celsius     -O----   125   125   000    -    48 (Min/Max 25/55)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ NON-DATA log
0x15       GPL,SL  R/W      1  SATA Rebuild Assist log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    256  Current Device Internal Status Data log
0x25       GPL     R/O    256  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    48 Celsius
Power Cycle Min/Max Temperature:     29/49 Celsius
Lifetime    Min/Max Temperature:     25/55 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (8)

Index    Estimated Time   Temperature Celsius
   9    2016-08-24 06:23    49  ******************************
 ...    ..( 12 skipped).    ..  ******************************
  22    2016-08-24 06:36    49  ******************************
  23    2016-08-24 06:37    48  *****************************
 ...    ..( 73 skipped).    ..  *****************************
  97    2016-08-24 07:51    48  *****************************
  98    2016-08-24 07:52    47  ****************************
 ...    ..(  3 skipped).    ..  ****************************
 102    2016-08-24 07:56    47  ****************************
 103    2016-08-24 07:57    46  ***************************
 ...    ..(  6 skipped).    ..  ***************************
 110    2016-08-24 08:04    46  ***************************
 111    2016-08-24 08:05    47  ****************************
 ...    ..( 14 skipped).    ..  ****************************
 126    2016-08-24 08:20    47  ****************************
 127    2016-08-24 08:21    48  *****************************
 ...    ..(  7 skipped).    ..  *****************************
   7    2016-08-24 08:29    48  *****************************
   8    2016-08-24 08:30    49  ******************************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 2) ==
0x01  0x008  4              15  ---  Lifetime Power-On Resets
0x01  0x018  6     16078448412  ---  Logical Sectors Written
0x01  0x020  6        76148289  ---  Number of Write Commands
0x01  0x028  6     11633444836  ---  Logical Sectors Read
0x01  0x030  6        61332439  ---  Number of Read Commands
0x01  0x038  6      1647367400  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4             258  ---  Spindle Motor Power-on Hours
0x03  0x010  4             258  ---  Head Flying Hours
0x03  0x018  4              66  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4          945927  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              48  ---  Current Temperature
0x05  0x010  1              45  N--  Average Short Term Temperature
0x05  0x018  1              41  N--  Average Long Term Temperature
0x05  0x020  1              55  ---  Highest Temperature
0x05  0x028  1              25  ---  Lowest Temperature
0x05  0x030  1              52  N--  Highest Average Short Term Temperature
0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
0x05  0x040  1              41  N--  Highest Average Long Term Temperature
0x05  0x048  1              25  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4              31  ---  Number of Hardware Resets
0x06  0x010  4               1  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           33  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           30  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

localhost ~ # smartctl -x /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKHJK9ZX
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.6-gentoo-debug2] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EFZX-68UW8N0
Serial Number:    VKHJK9ZX
LU WWN Device Id: 5 000cca 254d59e77
Firmware Version: 83.H0A83
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Aug 24 08:30:57 2016 Local time zone must be set--see zic m
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     164 (intermediate level without standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  101) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (1225) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   132   132   054    -    112
  3 Spin_Up_Time            POS---   144   144   024    -    458 (Average 456)
  4 Start_Stop_Count        -O--C-   100   100   000    -    16
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    18
  9 Power_On_Hours          -O--C-   100   100   000    -    453
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    16
 22 Unknown_Attribute       PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   100   100   000    -    64
193 Load_Cycle_Count        -O--C-   100   100   000    -    64
194 Temperature_Celsius     -O----   133   133   000    -    45 (Min/Max 25/55)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ NON-DATA log
0x15       GPL,SL  R/W      1  SATA Rebuild Assist log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    256  Current Device Internal Status Data log
0x25       GPL     R/O    256  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    45 Celsius
Power Cycle Min/Max Temperature:     29/46 Celsius
Lifetime    Min/Max Temperature:     25/55 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (9)

Index    Estimated Time   Temperature Celsius
  10    2016-08-24 06:23    46  ***************************
 ...    ..(  7 skipped).    ..  ***************************
  18    2016-08-24 06:31    46  ***************************
  19    2016-08-24 06:32    45  **************************
 ...    ..( 17 skipped).    ..  **************************
  37    2016-08-24 06:50    45  **************************
  38    2016-08-24 06:51    44  *************************
 ...    ..( 30 skipped).    ..  *************************
  69    2016-08-24 07:22    44  *************************
  70    2016-08-24 07:23    45  **************************
  71    2016-08-24 07:24    45  **************************
  72    2016-08-24 07:25    45  **************************
  73    2016-08-24 07:26    44  *************************
  74    2016-08-24 07:27    45  **************************
 ...    ..( 20 skipped).    ..  **************************
  95    2016-08-24 07:48    45  **************************
  96    2016-08-24 07:49    44  *************************
 ...    ..(  4 skipped).    ..  *************************
 101    2016-08-24 07:54    44  *************************
 102    2016-08-24 07:55    43  ************************
 ...    ..( 14 skipped).    ..  ************************
 117    2016-08-24 08:10    43  ************************
 118    2016-08-24 08:11    44  *************************
 ...    ..( 10 skipped).    ..  *************************
   1    2016-08-24 08:22    44  *************************
   2    2016-08-24 08:23    45  **************************
 ...    ..(  5 skipped).    ..  **************************
   8    2016-08-24 08:29    45  **************************
   9    2016-08-24 08:30    46  ***************************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 2) ==
0x01  0x008  4              16  ---  Lifetime Power-On Resets
0x01  0x018  6     16078777451  ---  Logical Sectors Written
0x01  0x020  6        76231268  ---  Number of Write Commands
0x01  0x028  6     11931313928  ---  Logical Sectors Read
0x01  0x030  6        60830864  ---  Number of Read Commands
0x01  0x038  6      1632365950  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4             258  ---  Spindle Motor Power-on Hours
0x03  0x010  4             258  ---  Head Flying Hours
0x03  0x018  4              64  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4          939802  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               1  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              45  ---  Current Temperature
0x05  0x010  1              42  N--  Average Short Term Temperature
0x05  0x018  1              42  N--  Average Long Term Temperature
0x05  0x020  1              55  ---  Highest Temperature
0x05  0x028  1              25  ---  Lowest Temperature
0x05  0x030  1              49  N--  Highest Average Short Term Temperature
0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
0x05  0x040  1              42  N--  Highest Average Long Term Temperature
0x05  0x048  1              25  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4              12  ---  Number of Hardware Resets
0x06  0x010  4               7  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           12  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           13  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

localhost ~ # smartctl -x /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKHNJWBX
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.6-gentoo-debug2] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EFZX-68UW8N0
Serial Number:    VKHNJWBX
LU WWN Device Id: 5 000cca 254d76e4d
Firmware Version: 83.H0A83
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Aug 24 08:31:15 2016 Local time zone must be set--see zic m
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     164 (intermediate level without standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unknown

=== START OF READ SMART DATA SECTION ===
SMART Status command failed
Please get assistance from http://smartmontools.sourceforge.net/
Register values returned from SMART Status command are:
 ERR=0x00, SC=0x00, LL=0x00, LM=0x00, LH=0x00, DEV=0x00, STS=0x50
SMART Status not supported: Invalid ATA output register values
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  101) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (1292) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   131   131   054    -    114
  3 Spin_Up_Time            POS---   100   100   024    -    0
  4 Start_Stop_Count        -O--C-   100   100   000    -    8
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    18
  9 Power_On_Hours          -O--C-   100   100   000    -    416
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    8
 22 Unknown_Attribute       PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   100   100   000    -    44
193 Load_Cycle_Count        -O--C-   100   100   000    -    44
194 Temperature_Celsius     -O----   122   122   000    -    49 (Min/Max 25/55)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ NON-DATA log
0x15       GPL,SL  R/W      1  SATA Rebuild Assist log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    256  Current Device Internal Status Data log
0x25       GPL     R/O    256  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    49 Celsius
Power Cycle Min/Max Temperature:     27/51 Celsius
Lifetime    Min/Max Temperature:     25/55 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (111)

Index    Estimated Time   Temperature Celsius
 112    2016-08-24 06:24    50  *******************************
 113    2016-08-24 06:25    50  *******************************
 114    2016-08-24 06:26    49  ******************************
 115    2016-08-24 06:27    49  ******************************
 116    2016-08-24 06:28    48  *****************************
 117    2016-08-24 06:29    48  *****************************
 118    2016-08-24 06:30    48  *****************************
 119    2016-08-24 06:31    47  ****************************
 120    2016-08-24 06:32    47  ****************************
 121    2016-08-24 06:33    47  ****************************
 122    2016-08-24 06:34    46  ***************************
 ...    ..(  4 skipped).    ..  ***************************
 127    2016-08-24 06:39    46  ***************************
   0    2016-08-24 06:40    45  **************************
 ...    ..(  4 skipped).    ..  **************************
   5    2016-08-24 06:45    45  **************************
   6    2016-08-24 06:46    44  *************************
 ...    ..(  5 skipped).    ..  *************************
  12    2016-08-24 06:52    44  *************************
  13    2016-08-24 06:53    45  **************************
 ...    ..(  8 skipped).    ..  **************************
  22    2016-08-24 07:02    45  **************************
  23    2016-08-24 07:03    46  ***************************
 ...    ..(  9 skipped).    ..  ***************************
  33    2016-08-24 07:13    46  ***************************
  34    2016-08-24 07:14    47  ****************************
 ...    ..( 10 skipped).    ..  ****************************
  45    2016-08-24 07:25    47  ****************************
  46    2016-08-24 07:26    48  *****************************
 ...    ..( 28 skipped).    ..  *****************************
  75    2016-08-24 07:55    48  *****************************
  76    2016-08-24 07:56    47  ****************************
 ...    ..( 13 skipped).    ..  ****************************
  90    2016-08-24 08:10    47  ****************************
  91    2016-08-24 08:11    48  *****************************
 ...    ..( 10 skipped).    ..  *****************************
 102    2016-08-24 08:22    48  *****************************
 103    2016-08-24 08:23    49  ******************************
 ...    ..(  6 skipped).    ..  ******************************
 110    2016-08-24 08:30    49  ******************************
 111    2016-08-24 08:31    51  ********************************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 2) ==
0x01  0x008  4               8  ---  Lifetime Power-On Resets
0x01  0x018  6     16076759236  ---  Logical Sectors Written
0x01  0x020  6        76063310  ---  Number of Write Commands
0x01  0x028  6      6196383585  ---  Logical Sectors Read
0x01  0x030  6        28576331  ---  Number of Read Commands
0x01  0x038  6      1499518400  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4             234  ---  Spindle Motor Power-on Hours
0x03  0x010  4             234  ---  Head Flying Hours
0x03  0x018  4              44  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4          939662  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              49  ---  Current Temperature
0x05  0x010  1              43  N--  Average Short Term Temperature
0x05  0x018  1               -  N--  Average Long Term Temperature
0x05  0x020  1              55  ---  Highest Temperature
0x05  0x028  1              25  ---  Lowest Temperature
0x05  0x030  1              52  N--  Highest Average Short Term Temperature
0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
0x05  0x040  1               -  N--  Highest Average Long Term Temperature
0x05  0x048  1               -  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4               1  ---  Number of Hardware Resets
0x06  0x010  4               0  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

This is what the pool status looked like a few hours ago... It was created on the previous stable release version of ZFS, and I never upgraded it.

  pool: n
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: none requested
config:

    NAME                                   STATE     READ WRITE CKSUM
    n                                      ONLINE       0     0     0
      raidz1-0                             ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKGNH8BX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHJK9ZX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHNJWBX  ONLINE       0     0     0

errors: No known data errors

I have another pool which did this too. It had two checksum errors on the top level mirror, and zero checksum (or any kind of) errors on the individual drives. I ran a zfs clear in the hope that it would forget about this checksum error until the next time the checksum really fails (to see if it was just a transient (cosmic ray) error or something). But it didn't forget about the I/O error on the file, but now it just says 0 errors... probably until I scrub it again... anyway... No hardware faults reported or detected....

This checksum error also occurred when I was running the latest "master" version of ZFS and SPL from Github, as of about the end of July, beginning of Aug.

localhost ~ # zpool status -v o
  pool: o
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 172h43m with 1 errors on Thu Aug 11 09:28:55 2016
config:

    NAME                                   STATE     READ WRITE CKSUM
    o                                      ONLINE       0     0     0
      mirror-0                             ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHMZSWX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHATXMX  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        o@2016_Jun_28:/Maxtor_3H500F0_H80KP5MH_blocks_505765120+7.img

localhost ~ # zpool history o
History for 'o':
2016-06-24.04:01:08 zpool create -O atime=off -o ashift=12 -o feature@lz4_compress=enabled -o feature@embedded_data=enabled o -f mirror /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKHMZSWX /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKHATXMX
2016-06-24.04:01:37 zfs set compression=on o
2016-06-24.04:01:52 zfs set exec=off o
2016-06-24.04:02:00 zfs set devices=off o
2016-06-24.04:02:08 zfs set setuid=off o
2016-06-24.04:03:10 zpool add o cache /dev/disk/by-id/ata-SAMSUNG_MZHPU512HCGL-00004_S1NDNYAFC00958
2016-06-25.19:07:48 zpool import -a
2016-06-26.00:02:30 zfs set compression=lz4 o
2016-06-28.23:43:45 zfs set dedup=sha256 o
2016-06-28.23:44:57 zfs snapshot o@2016_Jun_28
2016-06-29.20:39:22 zpool import -a
2016-06-29.20:40:36 zpool remove o /dev/disk/by-id/ata-SAMSUNG_MZHPU512HCGL-00004_S1NDNYAFC00958
2016-06-29.20:40:38 zpool export o
2016-08-04.03:08:17 zpool import o
2016-08-04.04:45:32 zpool clear o
2016-08-04.04:45:50 zpool scrub o
2016-08-10.22:24:43 zpool import -c /etc/zfs/zpool.cache -N -a
2016-08-11.20:00:52 zpool export o
rincebrain commented 7 years ago

So, if we take as true that importing it read-only will reproducibly report a top-level checksum error without reporting an error on any particular disk, that would imply that the data is, in fact, "corrupt".

I'm slightly concerned that both your example pools are with the same type of disk, but let's put that aside for now.

What distro are you on, and what version of ZoL are you running now (cat /sys/module/zfs/version)?(Ideally, if you were installing it through a package manager and/or have syslog, what versions were you running in the past, and when did you update? The goal here would be to see if we can figure out what version you were running when you wrote the affected file on one of those pools, and see if there's any known explanation for munging data and generating a checksum error that's been fixed since.)

The problem with your theory about it only starting in July, presuming the data on-disk is actually "corrupt" and not an artifact of some recent code, is that your log on pool "o" says one or more blocks in the snapshot from June 28th are munged, which implies any such mangling code dates back at least that far.

...I'm slightly concerned that your corruption is happening only on what appear to be large disk image files. Are these pools primarily used for storing things like large disk images, or are most of the pools not occupied by disk images, and so this could possibly be an interesting data point?

dedup is on on one and off on another, so that's probably not it. mirror on one and raidz1 on the other, so it's probably not parity calculations. I suppose it could be some edge case for the optimized fletcher4 routines, but that could only matter if the mangling in the June 28th snapshot predated the dedup=sha256 setting, since there's no optimized sha256 committed that I can immediately see...plus, I'd expect bugs in that to have caused extreme screaming much sooner.

Just for completeness, though, what's /proc/cpuinfo say, and what's /sys/module/zcommon/parameters/zfs_fletcher_4_impl say?

JuliaVixen commented 7 years ago

When a pool is imported read-only, the checksum error only shows up at the top level, and when the same pool is imported read-write, the checksum error will show up on the actual drive... and the top level of the pool.

(If imported read-only at the time; it will also "forget" about the checksum error when exported, so you can try it over again to see if it still happens under different circumstances.)

In the example above, I had the "n" pool imported read-write at the time, but there's no checksum error on any of the disks. I don't know how it's possible to have a checksum error on the whole raidz1, but not on any individual devices.

I've observed that the Seagate ST8000AS0002 and ST8000DM002 drives have a rather high rate of silent data corruption. I've switched to using WD and Hitachi drives a few months ago, and haven't detected any problems with any of the drives yet. These two example pools here, are are using drives, which were possibly manufactured on the same assembly line on the same day as each other. So, yeah, they could all have identical defects... though, wouldn't there at least be a checksum error or something? I have another pool with six disks, three pairs of different hard drive models, configured into a raidz1 pool. One of the Seagate drives started corrupting data, and somehow the entire pool has two checksum errors on the top level vdev now... but only one drive went bad. I wrote about it in issue #4983

The problem with your theory about it only starting in July, presuming the data on-disk is actually "corrupt" and not an artifact of some recent code, is that your log on pool "o" says one or more blocks in the snapshot from June 28th are munged, which implies any such mangling code dates back at least that far.

Oh, corrupt files are always reported by the name of the oldest snapshot it appears in. It has nothing to do with when the corruption actually happened, it's just ZFS's bookkeeping. If you try reading the file from every snapshot, and the current live filesystem, zpool status will tell you that you have dozens of corrupt files, which are all really just the same file appearing in a dozen snapshots. I have a pool of photos I've been moving around since 2011, and one of the recent copies of it developed an uncorrectable checksum error in a file from 2012, which was in a snapshot from 2013, on a set of hard drives manufactured in 2015, which had the error in 2016.

For the past 20 years or so, every time I'd get a new hard drive, I would keep the old one(s) as a sort-of backup. I'm cleaning out my garage and storage unit, and I have, like, a hundred hard drives! So, I've been imaging the drives to make sure I really really do have all of the data off them, wiping them, and giving them away. Most of them are less than 250G, so I've just been dumping raw images, and I'll actually spend the time to look though the data later. I can compress 30 old drives onto one new drive, and then I no longer have boxes and boxes full of hard drives in my garage.

Actually, dedup isn't "actually" on. I think I was going to try it on some disk images, but I never actually wrote the data. The filesystem named "dedup1" was dedupped once upon a time, but when I zfs send|recv it a while ago, it was un-dedupped, but I never changed the name of the filesystem. (The filesystem on these pools is not dedupped.)

localhost ~ # cat /sys/module/zfs/version
0.6.5-1

localhost ~ # uname -a
Linux localhost 4.4.6-gentoo-debug2 #1 SMP Sat Aug 13 07:21:18 Local time zone must be set--see zic  x86_64 Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz GenuineIntel GNU/Linux

Let's see, on 2016-04-25 I was using 3.10.7-gentoo-r1 with ZFS 6.5.4-r1-gentoo from the regular Gentoo package manager.

localhost ~ # modinfo  /lib/modules/3.10.7-gentoo-r1/extra/zfs/zfs.ko 
version:        0.6.5.4-r1-gentoo
[...]
srcversion:     4251E810337436FD7B850DA

Then apparently the next day on 2016-04-26 I upgraded to 4.4.6-gentoo, and probably using the same version of the Gentoo package.

On 2016-07-01 I upgraded ZFS again to the current GIT version, at the time...

filename:       /lib/modules/4.4.6-gentoo/extra/zfs/zfs.ko
version:        0.6.5-329_g5c27b29
srcversion:     CC978DF57728461C914D24D

On 2016-08-13 I upgraded to the current GIT version again, and also built the kernel with more debugging turned on. (And ZFS and ZPL with debugging turned on.)

filename:       /lib/modules/4.4.6-gentoo-debug/extra/zfs/zfs.ko
version:        0.6.5-1
srcversion:     1B0E25441FFC82D8549AB1B

I rebuilt ZFS again on 2016-08-22, but mostly just to add the that two line patch #4998

Here's stuff from my emerge log...

1461536603: Started emerge on: Apr 24, 2016 22:23:22
1461536603:  *** emerge --update zfs
[...]
1461551620: Started emerge on: Apr 25, 2016 02:33:40
1461551620:  *** emerge  zfs
1461551623:  >>> emerge (1 of 3) sys-kernel/spl-0.6.5.4-r1 to /
1461551625:  === (1 of 3) Cleaning (sys-kernel/spl-0.6.5.4-r1::/usr/portage/sys-kernel/spl/spl-0.6.5.4-r1.ebuild)
1461551625:  === (1 of 3) Compiling/Merging (sys-kernel/spl-0.6.5.4-r1::/usr/portage/sys-kernel/spl/spl-0.6.5.4-r1.ebuild)
1461551645:  === (1 of 3) Merging (sys-kernel/spl-0.6.5.4-r1::/usr/portage/sys-kernel/spl/spl-0.6.5.4-r1.ebuild)
1461551646:  >>> AUTOCLEAN: sys-kernel/spl:0
1461551646:  === Unmerging... (sys-kernel/spl-0.6.2-r1)
1461551646:  >>> unmerge success: sys-kernel/spl-0.6.2-r1
1461551650:  === (1 of 3) Post-Build Cleaning (sys-kernel/spl-0.6.5.4-r1::/usr/portage/sys-kernel/spl/spl-0.6.5.4-r1.ebuild)
1461551650:  ::: completed emerge (1 of 3) sys-kernel/spl-0.6.5.4-r1 to /
1461551650:  >>> emerge (2 of 3) sys-fs/zfs-kmod-0.6.5.4-r1 to /
1461551650:  === (2 of 3) Cleaning (sys-fs/zfs-kmod-0.6.5.4-r1::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-0.6.5.4-r1.ebuild)
1461551650:  === (2 of 3) Compiling/Merging (sys-fs/zfs-kmod-0.6.5.4-r1::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-0.6.5.4-r1.ebuild)
1461551719:  === (2 of 3) Merging (sys-fs/zfs-kmod-0.6.5.4-r1::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-0.6.5.4-r1.ebuild)
1461551720:  >>> AUTOCLEAN: sys-fs/zfs-kmod:0
1461551720:  === Unmerging... (sys-fs/zfs-kmod-0.6.2-r2)
1461551720:  >>> unmerge success: sys-fs/zfs-kmod-0.6.2-r2
1461551725:  === (2 of 3) Post-Build Cleaning (sys-fs/zfs-kmod-0.6.5.4-r1::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-0.6.5.4-r1.ebuild)
1461551725:  ::: completed emerge (2 of 3) sys-fs/zfs-kmod-0.6.5.4-r1 to /
1461551725:  >>> emerge (3 of 3) sys-fs/zfs-0.6.5.4-r2 to /
1461551725:  === (3 of 3) Cleaning (sys-fs/zfs-0.6.5.4-r2::/usr/portage/sys-fs/zfs/zfs-0.6.5.4-r2.ebuild)
1461551725:  === (3 of 3) Compiling/Merging (sys-fs/zfs-0.6.5.4-r2::/usr/portage/sys-fs/zfs/zfs-0.6.5.4-r2.ebuild)
1461551754:  === (3 of 3) Merging (sys-fs/zfs-0.6.5.4-r2::/usr/portage/sys-fs/zfs/zfs-0.6.5.4-r2.ebuild)
1461551755:  >>> AUTOCLEAN: sys-fs/zfs:0
1461551755:  === Unmerging... (sys-fs/zfs-0.6.2-r2)
1461551755:  >>> unmerge success: sys-fs/zfs-0.6.2-r2
1461551757:  === (3 of 3) Post-Build Cleaning (sys-fs/zfs-0.6.5.4-r2::/usr/portage/sys-fs/zfs/zfs-0.6.5.4-r2.ebuild)
1461551757:  ::: completed emerge (3 of 3) sys-fs/zfs-0.6.5.4-r2 to /
1461551757:  *** Finished. Cleaning up...
1461551757:  *** exiting successfully.
1461551757:  *** terminating.
1461554684: Started emerge on: Apr 25, 2016 03:24:43
1461554684:  *** emerge  zfs
1461554686:  >>> emerge (1 of 1) sys-fs/zfs-0.6.5.4-r2 to /
1461554686:  === (1 of 1) Cleaning (sys-fs/zfs-0.6.5.4-r2::/usr/portage/sys-fs/zfs/zfs-0.6.5.4-r2.ebuild)
1461554686:  === (1 of 1) Compiling/Merging (sys-fs/zfs-0.6.5.4-r2::/usr/portage/sys-fs/zfs/zfs-0.6.5.4-r2.ebuild)
1461554715:  === (1 of 1) Merging (sys-fs/zfs-0.6.5.4-r2::/usr/portage/sys-fs/zfs/zfs-0.6.5.4-r2.ebuild)
1461554715:  >>> AUTOCLEAN: sys-fs/zfs:0
1461554715:  === Unmerging... (sys-fs/zfs-0.6.5.4-r2)
1461554716:  >>> unmerge success: sys-fs/zfs-0.6.5.4-r2
1461554717:  === (1 of 1) Post-Build Cleaning (sys-fs/zfs-0.6.5.4-r2::/usr/portage/sys-fs/zfs/zfs-0.6.5.4-r2.ebuild)
1461554717:  ::: completed emerge (1 of 1) sys-fs/zfs-0.6.5.4-r2 to /
1461554717:  *** Finished. Cleaning up...
1461554717:  *** exiting successfully.
1461554717:  *** terminating.
[...]
1463613698: Started emerge on: May 18, 2016 23:21:38
1463613698:  *** emerge  =zfs-9999
1463613702:  >>> emerge (1 of 3) sys-kernel/spl-9999 to /
1463613702:  === (1 of 3) Cleaning (sys-kernel/spl-9999::/usr/portage/sys-kernel/spl/spl-9999.ebuild)
1463613702:  === (1 of 3) Compiling/Merging (sys-kernel/spl-9999::/usr/portage/sys-kernel/spl/spl-9999.ebuild)
1463613735:  === (1 of 3) Merging (sys-kernel/spl-9999::/usr/portage/sys-kernel/spl/spl-9999.ebuild)
1463613735:  >>> AUTOCLEAN: sys-kernel/spl:0
1463613735:  === Unmerging... (sys-kernel/spl-0.6.5.4-r1)
1463613736:  >>> unmerge success: sys-kernel/spl-0.6.5.4-r1
1463613737:  === (1 of 3) Post-Build Cleaning (sys-kernel/spl-9999::/usr/portage/sys-kernel/spl/spl-9999.ebuild)
1463613737:  ::: completed emerge (1 of 3) sys-kernel/spl-9999 to /
1463613737:  >>> emerge (2 of 3) sys-fs/zfs-kmod-9999 to /
1463613737:  === (2 of 3) Cleaning (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1463613737:  === (2 of 3) Compiling/Merging (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1463613843:  === (2 of 3) Merging (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1463613844:  >>> AUTOCLEAN: sys-fs/zfs-kmod:0
1463613844:  === Unmerging... (sys-fs/zfs-kmod-0.6.5.4-r1)
1463613844:  >>> unmerge success: sys-fs/zfs-kmod-0.6.5.4-r1
1463613845:  === (2 of 3) Post-Build Cleaning (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1463613845:  ::: completed emerge (2 of 3) sys-fs/zfs-kmod-9999 to /
1463613845:  >>> emerge (3 of 3) sys-fs/zfs-9999 to /
1463613845:  === (3 of 3) Cleaning (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1463613845:  === (3 of 3) Compiling/Merging (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1463613887:  === (3 of 3) Merging (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1463613887:  >>> AUTOCLEAN: sys-fs/zfs:0
1463613887:  === Unmerging... (sys-fs/zfs-0.6.5.4-r2)
1463613888:  >>> unmerge success: sys-fs/zfs-0.6.5.4-r2
1463613889:  === (3 of 3) Post-Build Cleaning (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1463613889:  ::: completed emerge (3 of 3) sys-fs/zfs-9999 to /
1463613889:  *** Finished. Cleaning up...
1463613889:  *** exiting successfully.
1463613889:  *** terminating.
[...]
1467362575: Started emerge on: Jul 01, 2016 08:42:55
1467362575:  *** emerge  =zfs-kmod-9999
1467362578:  >>> emerge (1 of 1) sys-fs/zfs-kmod-9999 to /
1467362578:  === (1 of 1) Cleaning (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1467362578:  === (1 of 1) Compiling/Merging (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1467362672:  === (1 of 1) Merging (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1467362673:  >>> AUTOCLEAN: sys-fs/zfs-kmod:0
1467362673:  === Unmerging... (sys-fs/zfs-kmod-9999)
1467362673:  >>> unmerge success: sys-fs/zfs-kmod-9999
1467362674:  === (1 of 1) Post-Build Cleaning (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1467362674:  ::: completed emerge (1 of 1) sys-fs/zfs-kmod-9999 to /
1467362674:  *** Finished. Cleaning up...
1467362675:  *** exiting successfully.
1467362675:  *** terminating.
1467362787: Started emerge on: Jul 01, 2016 08:46:26
1467362787:  *** emerge  =zfs-9999
1467362790:  >>> emerge (1 of 1) sys-fs/zfs-9999 to /
1467362790:  === (1 of 1) Cleaning (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1467362790:  === (1 of 1) Compiling/Merging (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1467362837:  === (1 of 1) Merging (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1467362838:  >>> AUTOCLEAN: sys-fs/zfs:0
1467362838:  === Unmerging... (sys-fs/zfs-9999)
1467362839:  >>> unmerge success: sys-fs/zfs-9999
1467362841:  === (1 of 1) Post-Build Cleaning (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1467362841:  ::: completed emerge (1 of 1) sys-fs/zfs-9999 to /
1467362841:  *** Finished. Cleaning up...
1467362841:  *** exiting successfully.
1467362841:  *** terminating.
[...]
1467491436: Started emerge on: Jul 02, 2016 20:30:35
1467491436:  *** emerge  =zfs-kmod-9999
1467491438:  >>> emerge (1 of 1) sys-fs/zfs-kmod-9999 to /
1467491438:  === (1 of 1) Cleaning (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1467491438:  === (1 of 1) Compiling/Merging (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1467491530:  === (1 of 1) Merging (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1467491530:  >>> AUTOCLEAN: sys-fs/zfs-kmod:0
1467491530:  === Unmerging... (sys-fs/zfs-kmod-9999)
1467491531:  >>> unmerge success: sys-fs/zfs-kmod-9999
1467491532:  === (1 of 1) Post-Build Cleaning (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1467491532:  ::: completed emerge (1 of 1) sys-fs/zfs-kmod-9999 to /
1467491532:  *** Finished. Cleaning up...
1467491533:  *** exiting successfully.
1467491533:  *** terminating.
1467491576: Started emerge on: Jul 02, 2016 20:32:55
1467491576:  *** emerge  =zfs-9999
1467491579:  >>> emerge (1 of 1) sys-fs/zfs-9999 to /
1467491579:  === (1 of 1) Cleaning (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1467491579:  === (1 of 1) Compiling/Merging (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1467491623:  === (1 of 1) Merging (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1467491624:  >>> AUTOCLEAN: sys-fs/zfs:0
1467491624:  === Unmerging... (sys-fs/zfs-9999)
1467491624:  >>> unmerge success: sys-fs/zfs-9999
1467491626:  === (1 of 1) Post-Build Cleaning (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1467491626:  ::: completed emerge (1 of 1) sys-fs/zfs-9999 to /
1467491626:  *** Finished. Cleaning up...
1467491626:  *** exiting successfully.
1467491626:  *** terminating.
[...]
1468013181: Started emerge on: Jul 08, 2016 21:26:21
1468013181:  *** emerge  =zfs-9999
1468013184:  >>> emerge (1 of 1) sys-fs/zfs-9999 to /
1468013184:  === (1 of 1) Cleaning (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1468013184:  === (1 of 1) Compiling/Merging (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1468013229:  === (1 of 1) Merging (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1468013230:  >>> AUTOCLEAN: sys-fs/zfs:0
1468013230:  === Unmerging... (sys-fs/zfs-9999)
1468013230:  >>> unmerge success: sys-fs/zfs-9999
1468013232:  === (1 of 1) Post-Build Cleaning (sys-fs/zfs-9999::/usr/portage/sys-fs/zfs/zfs-9999.ebuild)
1468013232:  ::: completed emerge (1 of 1) sys-fs/zfs-9999 to /
1468013232:  *** Finished. Cleaning up...
1468013232:  *** exiting successfully.
1468013232:  *** terminating.
1468013239: Started emerge on: Jul 08, 2016 21:27:19
1468013239:  *** emerge  =zfs-kmod-9999
1468013242:  >>> emerge (1 of 1) sys-fs/zfs-kmod-9999 to /
1468013242:  === (1 of 1) Cleaning (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1468013242:  === (1 of 1) Compiling/Merging (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1468013334:  === (1 of 1) Merging (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1468013335:  >>> AUTOCLEAN: sys-fs/zfs-kmod:0
1468013335:  === Unmerging... (sys-fs/zfs-kmod-9999)
1468013335:  >>> unmerge success: sys-fs/zfs-kmod-9999
1468013336:  === (1 of 1) Post-Build Cleaning (sys-fs/zfs-kmod-9999::/usr/portage/sys-fs/zfs-kmod/zfs-kmod-9999.ebuild)
1468013336:  ::: completed emerge (1 of 1) sys-fs/zfs-kmod-9999 to /
1468013336:  *** Finished. Cleaning up...
1468013336:  *** exiting successfully.
1468013336:  *** terminating.

After mid-July, I started just downloading the zfs-master.zip file off Github and building that directly. (So I could make sure it had all the debugging turned on.) It seems I have one dated 2016-07-09 and another one 2016-08-13. I thought I had a snapshot between those, but I'm not finding any.

Oh yeah, syslog...

Apr 25 04:21:19 localhost kernel: ZFS: Loaded module v0.6.5.4-r1-gentoo, ZFS pool version 5000, ZFS filesystem version 5
Apr 25 04:22:51 localhost kernel: ZFS: Loaded module v0.6.5.4-r1-gentoo, ZFS pool version 5000, ZFS filesystem version 5
May  2 08:41:43 localhost kernel: ZFS: Loaded module v0.6.5.4-r1-gentoo, ZFS pool version 5000, ZFS filesystem version 5
May 18 23:27:53 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
May 19 00:59:53 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
May 20 07:42:06 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
May 21 04:16:08 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
May 25 09:09:24 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
Jun  7 07:31:14 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
Jun  9 10:57:29 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
Jun 24 09:23:49 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
Jun 25 19:01:44 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
Jun 29 20:36:09 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
Jun 30 05:20:03 localhost kernel: ZFS: Loaded module v0.6.5-281_gbc2d809, ZFS pool version 5000, ZFS filesystem version 5
Jul  2 18:08:58 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  2 18:26:21 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  2 19:50:50 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  2 20:00:00 localhost kernel: ZFS: Unloaded module v0.6.5-329_g5c27b29
Jul  2 20:02:08 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  2 20:16:28 localhost kernel: ZFS: Unloaded module v0.6.5-329_g5c27b29
Jul  2 20:21:26 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  2 20:27:00 localhost kernel: ZFS: Unloaded module v0.6.5-329_g5c27b29
Jul  2 20:33:17 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  2 20:57:18 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  7 05:47:39 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  8 04:50:41 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  8 09:45:00 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  8 18:09:38 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  8 18:15:28 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  8 21:31:28 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  8 21:46:12 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Jul  9 21:18:17 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Jul  9 22:06:36 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Jul 19 21:30:09 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Jul 27 10:06:45 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Jul 29 08:42:10 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug  1 09:42:22 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug  1 11:11:14 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug  1 22:35:59 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug  3 10:51:08 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 10 22:24:40 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 13 08:02:52 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 14 10:12:13 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 14 11:19:08 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 15 22:07:58 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 16 00:25:24 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 17 00:11:22 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 17 00:19:57 localhost kernel: ZFS: Unloaded module v0.6.5-1 (DEBUG mode)
Aug 17 00:22:08 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 17 00:31:31 localhost kernel: ZFS: Unloaded module v0.6.5-1 (DEBUG mode)
Aug 17 01:53:49 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 17 03:40:59 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 17 05:32:04 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 17 05:32:17 localhost kernel: ZFS: Unloaded module v0.6.5-1 (DEBUG mode)
Aug 17 05:34:23 localhost kernel: ZFS: Loaded module v0.6.5-329_g5c27b29, ZFS pool version 5000, ZFS filesystem version 5
Aug 17 05:48:42 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 21 04:17:02 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 22 21:21:12 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
Aug 22 21:23:55 localhost kernel: ZFS: Unloaded module v0.6.5-1 (DEBUG mode)
Aug 22 21:25:13 localhost kernel: ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
localhost ~ # cat /sys/module/zcommon/parameters/zfs_fletcher_4_impl
[fastest] scalar sse2 ssse3 avx2 
localhost ~ # cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping    : 3
microcode   : 0x9
cpu MHz     : 3101.000
cache size  : 8192 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt
bugs        :
bogomips    : 6200.42
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping    : 3
microcode   : 0x9
cpu MHz     : 3101.000
cache size  : 8192 KB
physical id : 0
siblings    : 4
core id     : 1
cpu cores   : 4
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt
bugs        :
bogomips    : 6200.42
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping    : 3
microcode   : 0x9
cpu MHz     : 3101.000
cache size  : 8192 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 4
apicid      : 4
initial apicid  : 4
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt
bugs        :
bogomips    : 6200.42
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
stepping    : 3
microcode   : 0x9
cpu MHz     : 3101.000
cache size  : 8192 KB
physical id : 0
siblings    : 4
core id     : 3
cpu cores   : 4
apicid      : 6
initial apicid  : 6
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt
bugs        :
bogomips    : 6200.42
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
JuliaVixen commented 7 years ago

This just happened again!

I upgraded my kernel to 4.6.7, and SPL/ZFS 0.7.0-rc1 latest commits: 178acea and 4fd75d3 and I figured I'd run a scrub just to see if everything was working ok... So, I have this pool. It's eight 8TB drives, sitting in a QNAP thingy, with each drive setup as an iSCSI target. I've created a raidz2, so it's six drives of data, and two drives of parity... and therefore, in theory, I could loose two entire drives, and still be ok, right?

localhost ~ # zpool status -v Q
  pool: Q
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub in progress since Thu Sep  8 21:06:09 2016
    3.49T scanned out of 44.5T at 99.3M/s, 120h20m to go
    64K repaired, 7.84% done
config:

    NAME                                             STATE     READ WRITE CKSUM
    Q                                                ONLINE       0     0     3
      raidz2-0                                       ONLINE       0     0     9
        scsi-36e843b61f80bb1cd2a24d4fced91e4d1       ONLINE       0     0     2  (repairing)
        scsi-36e843b69e8cda2bd75ffd4ff7d8197d1       ONLINE       0     0     0
        scsi-36e843b6c54a5ddcdb7e4d4004d9b0dda       ONLINE       0     0     1  (repairing)
        scsi-36e843b61213de57de400d4b39d8d93db       ONLINE       0     0     0
        scsi-36e843b67f2fd084d0086d45fcda93fd6       ONLINE       0     0     0
        scsi-36e843b69b0cd364d05a7d48a2da78ed3       ONLINE       0     0     0
        scsi-36e843b67af44147d9d5cd4044db4aede       ONLINE       0     0     0
        scsi-36e843b6125bcb85d9083d4762daf14d7       ONLINE       0     0     0
    cache
      ata-SAMSUNG_MZHPU512HCGL-00004_S1NDNYAFC00958  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        Q/MK@Aug_23_2016:/stuff.dat

So, how is this possible?

JuliaVixen commented 7 years ago

I should also note that I compiled the kernel and zfs modules with gcc 4.7.3-r1 (Gentoo package).

I had to reboot for other reasons, in the middle of this scrub, and now the pool says it has zero checksum errors anywhere, but there are still "Permanent errors have been detected in the following files". The scrub is continuing right now.

I have seen no hardware errors anywhere, no MCE's, not even a worrying statistic in SMART. I've been checking some of the I2C stuff, and that's all normal too..

localhost ~ # edac-util -sv
edac-util: EDAC drivers are loaded. 1 MC detected:
  mc0:IE31200

localhost ~ # edac-util    
edac-util: No errors to report.
localhost ~ # sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +66.0 C  (high = +80.0 C, crit = +100.0 C)
Core 0:         +62.0 C  (high = +80.0 C, crit = +100.0 C)
Core 1:         +66.0 C  (high = +80.0 C, crit = +100.0 C)
Core 2:         +65.0 C  (high = +80.0 C, crit = +100.0 C)
Core 3:         +62.0 C  (high = +80.0 C, crit = +100.0 C)
JuliaVixen commented 7 years ago

This just happened again! Now all of my pools have uncorrectable errors!

Summary:

I moved the hard drives over to a completely different computer running FreeBSD 11.0RC3, and it's still reporting a checksum error on the entire raidz, but nothing on any drives. I haven't really written much to the pool since July 16, 2016, which was the last time I scrubbed it (without error).

The error doesn't seem to be with the hardware, or the OS, reading the disks. Whatever the problem is, it's permanently written onto the drives.

At this point, I'm speculating that a block, or something, got written to the wrong DVA. Since the pool was last scrubbed, I've created some snapshots, and destroyed a filesystem, and that's mostly all. I haven't written any data to the pool because it's almost totally full.

So... possibly zfs create, zfs destroy, or zfs snapshot, may be randomly nuking a sector somewhere in this pool? I still don't understand why no single device is reporting any checksum errors -- it's only in the parity at the top of the pool.

Exposition:

As I had the pool mounted readonly, I could re-test it under different conditions. I booted into:

- Linux 3.10.7-gentoo-r1  ZFS 0.6.5.4-r1-gentoo
- Linux 4.4.6-gentoo      ZFS 0.6.5-329_g5c27b29
- Linux 4.6.7             ZFS 0.7.0-rc1

Each time, I'd try to read the same file, and each time this was the result.

localhost ~ # md5sum /l/photos/2015/2015_09_04/_75A1807.NEF 
md5sum: /l/photos/2015/2015_09_04/_75A1807.NEF: Input/output error

localhost ~ # zpool status -v
  pool: l
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 35h10m with 0 errors on Sat Jul 16 09:23:49 2016
config:

    NAME                                   STATE     READ WRITE CKSUM
    l                                      ONLINE       0     0     1
      raidz1-0                             ONLINE       0     0     2
        ata-WDC_WD80EFZX-68UW8N0_VKGU6V2X  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKH408MX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKH52T6X  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHLNHZX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHNJ93X  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /l/photos/2015/2015_09_04/_75A1807.NEF

Ok, so, sometimes ZFS won't actually report checksum errors on individual drives if it's mounted readonly, but I don't want to make any permanent changes yet, before I've finished testing.

Anyway, so there's nothing in dmesg, smartadm reports all drives are in perfect health, edac says no errors, etc. etc.

I just got a new fileserver, so I plugged all of the WD80EFZX drives from this pool into my new server (ECC memory, hardware passed burn-in tests), and booted the FreeBSD 11.0RC3 Live DVD, and imported this pool readonly....

root@:~ # uname -a
FreeBSD  11.0-RC3 FreeBSD 11.0-RC3 #0 r305786: Wed Sep 14 02:19:25 UTC 2016     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

root@:~ # zpool import -o altroot=/tmp/l -o readonly=on l

root@:~ # zpool status -v
  pool: l
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 35h10m with 0 errors on Sat Jul 16 09:23:49 2016
config:

    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     0
      raidz1-0                                                      ONLINE       0     0     0
        da7p1                                                       ONLINE       0     0     0
        diskid/DISK-VKH408MX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKH52T6X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHLNHZX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHNJ93X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0

errors: No known data errors

root@:~ # md5 /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF
md5: /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF: Input/output error

root@:~ # zpool status -v
  pool: l
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 35h10m with 0 errors on Sat Jul 16 09:23:49 2016
config:

    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     1
      raidz1-0                                                      ONLINE       0     0     2
        da7p1                                                       ONLINE       0     0     0
        diskid/DISK-VKH408MX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKH52T6X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHLNHZX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHNJ93X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF

Well, there's the checksum error(s) again....

So, I export, and re-import read-write, because maybe this time it'll figure out which device has the error and automatically repair it... Cause that's what ZFS does, right?

root@:~ # zpool export l
root@:~ # zpool import -o altroot=/tmp/l  l
root@:~ # zpool status -v
  pool: l
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 35h10m with 0 errors on Sat Jul 16 09:23:49 2016
config:

    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     0
      raidz1-0                                                      ONLINE       0     0     0
        da7p1                                                       ONLINE       0     0     0
        diskid/DISK-VKH408MX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKH52T6X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHLNHZX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHNJ93X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0

errors: No known data errors

root@:~ # md5 /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF
md5: /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF: Input/output error

root@:~ # zpool status -v
  pool: l
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 35h10m with 0 errors on Sat Jul 16 09:23:49 2016
config:

    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     0
      raidz1-0                                                      ONLINE       0     0     0
        da7p1                                                       ONLINE       0     0     0
        diskid/DISK-VKH408MX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKH52T6X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHLNHZX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHNJ93X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0

errors: No known data errors

Huh, must be some kind of delated update until a TXG gets written or something....

root@:~ # md5 /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF
md5: /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF: Input/output error

root@:~ # md5 /tmp/l/l/photos/2015/2015_09_04/_75A1806.NEF
MD5 (/tmp/l/l/photos/2015/2015_09_04/_75A1806.NEF) = bd04e007d90923db15a1b8d37511a41a

root@:~ # md5 /tmp/l/l/photos/2015/2015_09_04/_75A1808.NEF
MD5 (/tmp/l/l/photos/2015/2015_09_04/_75A1808.NEF) = 518643f4912ea19634e482dd8327c32a

root@:~ # md5 /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF
md5: /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF: Input/output error

root@:~ # zpool status -v
  pool: l
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 35h10m with 0 errors on Sat Jul 16 09:23:49 2016
config:

    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     2
      raidz1-0                                                      ONLINE       0     0     4
        da7p1                                                       ONLINE       0     0     0
        diskid/DISK-VKH408MX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKH52T6X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHLNHZX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHNJ93X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF

Ok, there it goes...

Curiously, each time I attempt to read the file, two more checksum errors appear in the stats...

root@:~ # md5 /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF
md5: /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF: Input/output error

root@:~ # zpool status -v
  pool: l
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 35h10m with 0 errors on Sat Jul 16 09:23:49 2016
config:

    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     3
      raidz1-0                                                      ONLINE       0     0     6
        da7p1                                                       ONLINE       0     0     0
        diskid/DISK-VKH408MX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKH52T6X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHLNHZX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHNJ93X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF

root@:~ # md5 /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF
md5: /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF: Input/output error

root@:~ # zpool status -v
  pool: l
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 35h10m with 0 errors on Sat Jul 16 09:23:49 2016
config:

    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     4
      raidz1-0                                                      ONLINE       0     0     8
        da7p1                                                       ONLINE       0     0     0
        diskid/DISK-VKH408MX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKH52T6X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHLNHZX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHNJ93X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF

[md5 etc. etc.]
    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     5
      raidz1-0                                                      ONLINE       0     0    10
[md5 etc. etc.]
    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     6
      raidz1-0                                                      ONLINE       0     0    12
[md5 etc. etc.]
    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     7
      raidz1-0                                                      ONLINE       0     0    14
[md5 etc. etc.]
    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     8
      raidz1-0                                                      ONLINE       0     0    16
[md5 etc. etc.]
    NAME                                                            STATE     READ WRITE CKSUM
    l                                                               ONLINE       0     0     9
      raidz1-0                                                      ONLINE       0     0    18
        da7p1                                                       ONLINE       0     0     0
        diskid/DISK-VKH408MX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKH52T6X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHLNHZX%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0
        diskid/DISK-VKHNJ93X%20%20%20%20%20%20%20%20%20%20%20%20p1  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /tmp/l/l/photos/2015/2015_09_04/_75A1807.NEF

Those numbers only increment when I try to read the file, The box has been sitting idle for an hour, and nothing has changed.

ANYWAY, so NONE of the disks themselves have Checksum errors. What is going on?

I tried to see if there were any clues with zdb -c -d... but it seems like the -c option to check the checksums on blocks, doesn't actually do anything. I have a backup copy of the file (and backup of the entire pool), so I guess I could compare the checksums in the block pointers, but I'm not sure how exactly to recalculate the checksums of the blocks on this "corrupt" pool, to see what's become corrupt.

This pool was totally ok back on July 16. These are the only things I've done since then:

2016-07-14.22:13:09 zpool scrub l
2016-07-16.09:37:04 zpool export l
2016-07-29.04:49:14 zpool import l
2016-07-29.04:49:44 zfs snapshot l@2016_Jul_28
2016-07-29.04:50:18 zfs destroy l/test
2016-07-29.04:50:32 zfs snapshot l/CDs@2016_Jul_28
2016-07-29.04:50:42 zfs snapshot l/Fanime@2016_Jul_28
2016-07-29.04:50:51 zfs snapshot l/MKVs@2016_Jul_28
2016-07-29.04:51:04 zfs snapshot l/new@2016_Jul_28
2016-07-29.04:52:08 zpool export l
2016-07-31.01:17:07 zpool import l
2016-07-31.02:38:27 zfs create l/Jul_29_2016
2016-07-31.02:38:40 zfs set copies=2 l/Jul_29_2016
2016-08-01.00:20:03 zfs create l/Lightroom
2016-08-01.05:22:14 zfs destroy -r l/Fanime
2016-08-01.09:21:34 zfs snapshot l/Lightroom@2016_Jul_30
2016-08-01.09:21:51 zfs snapshot l/Jul_29_2016@2016_Jul_30
2016-08-02.01:58:56 zfs destroy -r l/Lightroom
2016-08-02.08:11:14 zfs snapshot l/Jul_29_2016@2016_Aug_02
2016-08-02.09:11:22 zfs destroy l/Jul_29_2016@2016_Jul_30
2016-08-02.09:43:23 zfs destroy l/Jul_29_2016@2016_Aug_02
2016-08-03.00:10:47 zpool export l
2016-09-12.18:28:33 zpool import l
2016-09-12.18:29:07 zfs snap -r l@2016_Sep_12
2016-09-12.18:29:29 zpool export l
2016-09-14.22:40:02 zpool import -o altroot=/tmp/l l

I've only done zfs send from the pool since July 4, 2016, so there have been no zfs recv written to this pool.

root@:~ # zfs get all l
NAME  PROPERTY              VALUE                  SOURCE
l     type                  filesystem             -
l     creation              Mon Apr 25  7:00 2016  -
l     used                  27.8T                  -
l     available             300G                   -
l     referenced            213G                   -
l     compressratio         1.01x                  -
l     mounted               yes                    -
l     quota                 none                   default
l     reservation           none                   default
l     recordsize            128K                   default
l     mountpoint            /tmp/l/l               default
l     sharenfs              off                    default
l     checksum              on                     default
l     compression           on                     local
l     atime                 off                    local
l     devices               off                    local
l     exec                  on                     default
l     setuid                off                    local
l     readonly              off                    default
l     jailed                off                    default
l     snapdir               hidden                 default
l     aclmode               discard                default
l     aclinherit            restricted             default
l     canmount              on                     default
l     xattr                 off                    temporary
l     copies                1                      default
l     version               5                      -
l     utf8only              off                    -
l     normalization         none                   -
l     casesensitivity       sensitive              -
l     vscan                 off                    default
l     nbmand                off                    default
l     sharesmb              off                    default
l     refquota              none                   default
l     refreservation        none                   default
l     primarycache          all                    default
l     secondarycache        all                    default
l     usedbysnapshots       179K                   -
l     usedbydataset         213G                   -
l     usedbychildren        27.6T                  -
l     usedbyrefreservation  0                      -
l     logbias               latency                default
l     dedup                 off                    default
l     mlslabel                                     -
l     sync                  standard               default
l     refcompressratio      1.11x                  -
l     written               0                      -
l     logicalused           28.2T                  -
l     logicalreferenced     236G                   -
l     volmode               default                default
l     filesystem_limit      none                   default
l     snapshot_limit        none                   default
l     filesystem_count      none                   default
l     snapshot_count        none                   default
l     redundant_metadata    all                    default

root@:~ # zfs get all l/photos
NAME      PROPERTY              VALUE                                                       SOURCE
l/photos  type                  filesystem                                                  -
l/photos  creation              Tue Apr 26 19:43 2016                                       -
l/photos  used                  15.5T                                                       -
l/photos  available             300G                                                        -
l/photos  referenced            14.9T                                                       -
l/photos  compressratio         1.01x                                                       -
l/photos  mounted               yes                                                         -
l/photos  quota                 none                                                        default
l/photos  reservation           none                                                        default
l/photos  recordsize            128K                                                        default
l/photos  mountpoint            /tmp/l/l/photos                                             default
l/photos  sharenfs              fsid=25,rw=172.16.111.0/24,sec=sys,insecure,insecure_locks  received
l/photos  checksum              on                                                          default
l/photos  compression           on                                                          inherited from l
l/photos  atime                 off                                                         inherited from l
l/photos  devices               off                                                         inherited from l
l/photos  exec                  on                                                          default
l/photos  setuid                off                                                         inherited from l
l/photos  readonly              off                                                         default
l/photos  jailed                off                                                         default
l/photos  snapdir               hidden                                                      default
l/photos  aclmode               discard                                                     default
l/photos  aclinherit            restricted                                                  default
l/photos  canmount              on                                                          default
l/photos  xattr                 off                                                         temporary
l/photos  copies                1                                                           default
l/photos  version               1                                                           -
l/photos  utf8only              off                                                         -
l/photos  normalization         none                                                        -
l/photos  casesensitivity       sensitive                                                   -
l/photos  vscan                 off                                                         default
l/photos  nbmand                off                                                         default
l/photos  sharesmb              off                                                         default
l/photos  refquota              none                                                        default
l/photos  refreservation        none                                                        default
l/photos  primarycache          all                                                         default
l/photos  secondarycache        all                                                         default
l/photos  usedbysnapshots       677G                                                        -
l/photos  usedbydataset         14.9T                                                       -
l/photos  usedbychildren        0                                                           -
l/photos  usedbyrefreservation  0                                                           -
l/photos  logbias               latency                                                     default
l/photos  dedup                 off                                                         default
l/photos  mlslabel                                                                          -
l/photos  sync                  standard                                                    default
l/photos  refcompressratio      1.01x                                                       -
l/photos  written               12.8K                                                       -
l/photos  logicalused           15.7T                                                       -
l/photos  logicalreferenced     15.1T                                                       -
l/photos  volmode               default                                                     default
l/photos  filesystem_limit      none                                                        default
l/photos  snapshot_limit        none                                                        default
l/photos  filesystem_count      none                                                        default
l/photos  snapshot_count        none                                                        default
l/photos  redundant_metadata    all                                                         default

root@:~ # zpool get all l
NAME  PROPERTY                       VALUE                          SOURCE
l     size                           36.2T                          -
l     capacity                       95%                            -
l     altroot                        /tmp/l                         local
l     health                         ONLINE                         -
l     guid                           4946876290228094116            default
l     version                        -                              default
l     bootfs                         -                              default
l     delegation                     on                             default
l     autoreplace                    off                            default
l     cachefile                      none                           local
l     failmode                       wait                           default
l     listsnapshots                  off                            default
l     autoexpand                     off                            default
l     dedupditto                     0                              default
l     dedupratio                     1.00x                          -
l     free                           1.50T                          -
l     allocated                      34.8T                          -
l     readonly                       off                            -
l     comment                        -                              default
l     expandsize                     -                              -
l     freeing                        0                              default
l     fragmentation                  41%                            -
l     leaked                         0                              default
l     feature@async_destroy          enabled                        local
l     feature@empty_bpobj            active                         local
l     feature@lz4_compress           active                         local
l     feature@multi_vdev_crash_dump  disabled                       local
l     feature@spacemap_histogram     active                         local
l     feature@enabled_txg            active                         local
l     feature@hole_birth             active                         local
l     feature@extensible_dataset     active                         local
l     feature@embedded_data          active                         local
l     feature@bookmarks              enabled                        local
l     feature@filesystem_limits      enabled                        local
l     feature@large_blocks           active                         local
l     feature@sha512                 disabled                       local
l     feature@skein                  disabled                       local

root@:~ # dmesg
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-RC3 #0 r305786: Wed Sep 14 02:19:25 UTC 2016
    root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
[etc. etc.]

FreeBSD isn't reporting any error messages either.

Here's the zpool properties when seen from Linux:

localhost ~ # zpool get all l
NAME  PROPERTY                    VALUE                       SOURCE
l     size                        36.2T                       -
l     capacity                    95%                         -
l     altroot                     -                           default
l     health                      ONLINE                      -
l     guid                        4946876290228094116         default
l     version                     -                           default
l     bootfs                      -                           default
l     delegation                  on                          default
l     autoreplace                 off                         default
l     cachefile                   -                           default
l     failmode                    wait                        default
l     listsnapshots               off                         default
l     autoexpand                  off                         default
l     dedupditto                  0                           default
l     dedupratio                  1.00x                       -
l     free                        1.50T                       -
l     allocated                   34.8T                       -
l     readonly                    on                          -
l     ashift                      12                          local
l     comment                     -                           default
l     expandsize                  -                           -
l     freeing                     0                           default
l     fragmentation               0%                          -
l     leaked                      0                           default
l     feature@async_destroy       enabled                     local
l     feature@empty_bpobj         active                      local
l     feature@lz4_compress        active                      local
l     feature@spacemap_histogram  active                      local
l     feature@enabled_txg         active                      local
l     feature@hole_birth          active                      local
l     feature@extensible_dataset  active                      local
l     feature@embedded_data       active                      local
l     feature@bookmarks           enabled                     local
l     feature@filesystem_limits   enabled                     local
l     feature@large_blocks        active                      local
l     feature@large_dnode         disabled                    local

Here's a dump of this inode, the -c option doesn't seem to make anything happen differently.

root@:~ # zdb -cc -dddddddbbbbbbbbbb l/photos 6410428
Dataset l/photos [ZPL], ID 218, cr_txg 69828, 14.9T, 2355900 objects, rootbp DVA[0]=<0:84000850000:2000> DVA[1]=<0:1280002f4000:2000> [L0 DMU objset] fletcher4 uncompressed LE contiguous unique double size=800L/800P birth=12444845L/12444845P fill=2355900 cksum=bc2cc1b89:d5b8038d29a:98df8f42123a3:51f4b5475c850c9

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
   6410428    3    16K   128K  33.4M  33.5M  100.00  ZFS plain file (K=inherit) (Z=inherit)
                                        264   bonus  ZFS znode
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 267
    path    /2015/2015_09_04/_75A1807.NEF
    uid     0
    gid     0
    atime   Fri Sep  4 03:27:40 2015
    mtime   Fri Sep  4 03:27:40 2015
    ctime   Wed Sep  9 21:27:44 2015
    crtime  Wed Sep  9 06:19:41 2015
    gen 1757602
    mode    100555
    size    35044923
    parent  6420259
    links   1
    pflags  40000000104
    xattr   0
    rdev    0x0000000000000000
Indirect blocks:
               0 L2   DVA[0]=<0:6630b682000:2000> DVA[1]=<0:b40320a4000:2000> [L2 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/1000P birth=12305637L/12305637P fill=268 cksum=1fbd4b558f:79f035e3e33d:ea92df379c6d9a:2d4505cb254065ac
               0  L1  DVA[0]=<0:65891b66000:4000> DVA[1]=<0:b403209a000:4000> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/2000P birth=12305637L/12305637P fill=128 cksum=24063f503c6:c0ec5ee0b8cab:22e0bbb12f5088b1:87a4a9d173575d61
               0   L0 DVA[0]=<0:66308cbc000:22000> [L0 ZFS plain file] fletcher4 lz4 LE contiguous unique single size=20000L/1a000P birth=12305637L/12305637P fill=1 cksum=33b499376f88:a3f7058f58c59e9:5afba2fd30757f06:3bc612cb35cc162a
           20000   L0 DVA[0]=<0:66308d06000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3824e4fbf730:ea25f515f3ebdc2:2fdac10274e1105:cf2d88aa62c5ed6a
           40000   L0 DVA[0]=<0:66308cde000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=42c69803a949:10bcbdc92dd64fab:17537547304a704:7e4994f4ccd6ceca
           60000   L0 DVA[0]=<0:66308d2e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=432e0b619912:10d1c2a4c8ba04cf:d4bd5b4fc155aa03:57ec654d305579d5
           80000   L0 DVA[0]=<0:66308d56000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=43b7e3133f7b:10e3fd4fbcdcdb23:afbe71badb52104d:e57ca66e404404e8
           a0000   L0 DVA[0]=<0:66308d7e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=43b3428ce067:10e7e9cd8c57129f:b943fee8284087d8:97c7315ff0e4bc7c
           c0000   L0 DVA[0]=<0:66308da6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=43f8b471c5d6:10f695dd18a706c8:e53468105b098210:49a4df639d309587
           e0000   L0 DVA[0]=<0:66308dce000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=43805f39b9dc:10ebbff4f6343384:4386b2b9616e0174:b8f9fdb2a6408346
          100000   L0 DVA[0]=<0:66308df6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3d043810a047:f7f0a39692ddad8:d0aed23d7ef2b920:db36a71aa4826799
          120000   L0 DVA[0]=<0:66308e1e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3d6315a9c315:f65b224dfdeb814:d8be43829d86336b:e84586cf0d80a0db
          140000   L0 DVA[0]=<0:66308e46000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3c130020a20c:f12684bfda0f6c9:8934c629ddb6e00f:968c63069a6f71ea
          160000   L0 DVA[0]=<0:66308e6e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3bd40c70501a:eee2b014cd9500d:1d095e0176e967cb:a955e497b4f9dcad
          180000   L0 DVA[0]=<0:66308e96000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3c61e667a11a:f18294ee7caf682:ce57356273f56b9:c454f6a2c60bcf1f
          1a0000   L0 DVA[0]=<0:66308ebe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3c6e13b140f9:f13f7824028ab8b:c57a9eee33bc98:8ba502e3d1f5e82b
          1c0000   L0 DVA[0]=<0:66308ee6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3c53a5b1f4a0:f090e0d740e804f:c6582e2c7804d024:d44761d737bbefe6
          1e0000   L0 DVA[0]=<0:66308f0e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3c0996c84ce6:f0555ae966306c4:5c15920b930d58d7:b92691a562133b23
          200000   L0 DVA[0]=<0:66308f36000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3cc6c0ff6a54:f33daab1647e907:86a5b06d68041c71:c213ccc1c97c03f
          220000   L0 DVA[0]=<0:66308f5e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3d68559907d5:f4f24d5cebdcd46:6ccbbcd197e45569:6093e08366f52d32
          240000   L0 DVA[0]=<0:66308f86000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3d3455568cbd:f582d7af6f65321:22bf1a44d1ccaff8:dd7d3f5377a101dc
          260000   L0 DVA[0]=<0:66308fae000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3d1c80330b6b:f4e234ef51324e5:79e6460e99f3797e:559dba0ba5d729a0
          280000   L0 DVA[0]=<0:66308fd6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3d4d415999b2:f5a61e8249914b0:3134a4d5169d7df0:5c60227bd32b8c70
          2a0000   L0 DVA[0]=<0:66308ffe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3d5d27bb11b7:f5842041013f09c:f07ca23fca821d5f:781464d2ccc13789
          2c0000   L0 DVA[0]=<0:66309026000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ca868e4de37:f25cb0b6d938a31:e7e68853c43ab9ac:18782abe661d6ebf
          2e0000   L0 DVA[0]=<0:6630904e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3d3a3dd2f309:f4a86301cade357:564e3e218fbe4198:88c443353b1ad33e
          300000   L0 DVA[0]=<0:66309076000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3d1f7affc45d:f5cd140f1086c6b:c0c07051f927f299:34dfb8205b69f689
          320000   L0 DVA[0]=<0:6630909e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3c1422cc371a:f192e4750c1e0d5:eaf0dffb6ef4a832:92e437d53bbd846f
          340000   L0 DVA[0]=<0:663090c6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3afa1391199c:ed194d22d80f243:e780aa299e707560:ef4918299562a9e4
          360000   L0 DVA[0]=<0:663090ee000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3b67b83a60b1:ec761bd0fa97fe1:82ebfe3d8690e536:7a6697c340f23632
          380000   L0 DVA[0]=<0:66309116000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fa02bfea1f1:fee87dd4efbd5c5:c321ec4ad2daf1d0:820e4db4eec9a04
          3a0000   L0 DVA[0]=<0:6630913e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f861616bcb5:fe4fcc0a21e81a9:779fa9b19a87e874:c26ee93bdbe92836
          3c0000   L0 DVA[0]=<0:66309166000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4059e71b3076:1017de6af06ef91b:3e8ae9642875e4df:15eb5e4b23ddd20a
          3e0000   L0 DVA[0]=<0:6630918e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=400a9aaf8707:ff65b6ebd242936:70469cef10663e32:cd0d48b1e79dc224
          400000   L0 DVA[0]=<0:663091b6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=407dac92fd1d:101d4bde407a049c:2be6251e40cb954a:dade19579b74da2d
          420000   L0 DVA[0]=<0:66309206000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=411fdb038441:104d5981dd085dd3:893da0bf0bef64e9:a1155a672041c464
          440000   L0 DVA[0]=<0:663091de000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=418a8d80476b:10686b8f848d7edd:698d47595bca60db:657cb90b4aa27bea
          460000   L0 DVA[0]=<0:6630922e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41b96fbff074:106e33b3bd3ea4f8:d5d104bbd8ed39c4:32615339e4f63e7f
          480000   L0 DVA[0]=<0:66309256000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41e5afc570eb:10847d1d7f9fdc0e:7e50b242892ee5a0:c10bf96b0fbf807a
          4a0000   L0 DVA[0]=<0:663092a6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4152814a738f:105cf0c30ff87eb9:f874e3c463f0282b:260997ef511d09d8
          4c0000   L0 DVA[0]=<0:6630927e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=409941a32f1c:1028a6b5b290c916:a7a9abcd19d1e1a1:9c514425b18c2dda
          4e0000   L0 DVA[0]=<0:663092ce000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40763902bfb9:1020f6ddb5930dfb:7a648b5b40a322fa:5f9c08bd9ab5e45d
          500000   L0 DVA[0]=<0:663092f6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=406d9857573f:101699cbed334e0a:abf834f14b1ae0c6:4ff74db9bc799cbe
          520000   L0 DVA[0]=<0:6630931e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=400674c3d39f:ffdf7b9cdbb6e4f:96164391f5fcd2f0:844c7313dc5e9641
          540000   L0 DVA[0]=<0:66309346000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40093903c3d6:100ade3e55243d8e:70f0451ca9d47333:6a60da99cc9770ca
          560000   L0 DVA[0]=<0:6630936e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fe65583ab51:1000cb2484400024:3d61f2bc6c4ff8c1:2f1c720785cf53c
          580000   L0 DVA[0]=<0:66309396000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fbdab0c095e:fe555c6611e5ad6:7b20cbba51f7c2df:6483f8b7b5cf3528
          5a0000   L0 DVA[0]=<0:663093be000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fe65cc9c9d2:ff598c992b4e663:28ae6e0e44c7c400:fec7ef7d1a102ba0
          5c0000   L0 DVA[0]=<0:663093e6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f97763c8300:ff177d5492a8d05:766a666e11c0e181:57b1927a72ff148f
          5e0000   L0 DVA[0]=<0:6630940e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ffb16f91a96:ff79b768d41efc2:d49e6b088be6e789:56b160f29a1a69
          600000   L0 DVA[0]=<0:66309436000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ff9896cc834:100a4644ab37396f:fa994eb7137a5bc2:341320590519d5f0
          620000   L0 DVA[0]=<0:6630945e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f67e02b2023:fe4d63e1b4b1aa3:452c9a1ff6b42ca7:5b7eec2e8d60ba48
          640000   L0 DVA[0]=<0:66309486000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f708f12e1fc:fe63a4d4352cadb:bee5cb75b9c05525:3d509decaa1fb797
          660000   L0 DVA[0]=<0:663094d6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fa6c160128a:fee3e2a7a4daa1b:f1caccccd99e4c07:a4e76ed6ed908ba3
          680000   L0 DVA[0]=<0:663094ae000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f3842d043fe:fc5bcc2c2d700a7:4625863a173c7277:8dfcc3fb77acf538
          6a0000   L0 DVA[0]=<0:66309526000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f563a888f04:fcf0496c67c9bc4:b217645daa7bd66d:f48132ce876d3733
          6c0000   L0 DVA[0]=<0:663094fe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fa206390827:fe79d9448accfe2:6263708f52408ec7:ee366c1f71c268c3
          6e0000   L0 DVA[0]=<0:6630954e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f26099a5ebf:fc486648ccaf313:77c9d0fc40745e95:42c5bb8c5936a7a2
          700000   L0 DVA[0]=<0:66309576000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3efb5d586591:fbd92985b3e2a8b:1d90082b424dca04:716daa79a5f668a9
          720000   L0 DVA[0]=<0:6630959e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f162a57d006:fc208c32b7db794:687671fa47cd1843:123191c354ff1685
          740000   L0 DVA[0]=<0:663095c6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f3f549dd6f6:fd7aa5898bcf5cb:9e6abfa5f80a2492:b0073fce5118b54e
          760000   L0 DVA[0]=<0:663095ee000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f7d6653bd15:fd9af9ef21c368d:34a3a04f2971890f:e35c5d84a69b9b85
          780000   L0 DVA[0]=<0:66309616000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f5c27ca0a07:fd81fe9cc2e3fae:cf920b57597d6b7c:b0bcddfc5758b678
          7a0000   L0 DVA[0]=<0:6630963e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f50694e680b:fd13a56a6842dde:8215e118bd544f4:873d3a4d41ae471c
          7c0000   L0 DVA[0]=<0:66309666000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fa646431235:fed5fc096123aca:d9673fef4c6ff49d:64755320b66af381
          7e0000   L0 DVA[0]=<0:6630968e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ff15ad05ec5:100aaab43fd229be:29a234d3ba87fd30:ac4b9fa82d1d257f
          800000   L0 DVA[0]=<0:663096b6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f8c7791e5b4:fdd9bc4152a0681:68ccd83cd66f6950:6db2c6e316b7e01e
          820000   L0 DVA[0]=<0:663096de000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f85ab39d2ca:fe2a20476e5087d:ad6c0f0d99b960c4:5b67b16f06a22c2e
          840000   L0 DVA[0]=<0:66309706000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f3411296f4d:fdc3988af22e49c:524bd8b8147a546b:37b1df2600ce7ff3
          860000   L0 DVA[0]=<0:6630972e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f7a1931c469:fe0f28452052f2c:31578bb42bffce26:dff9c16cabee89c6
          880000   L0 DVA[0]=<0:66309756000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f16310b8b10:fcc2a00e02076f9:b2c2c85eb9e071b1:c4d6b50d29c1a734
          8a0000   L0 DVA[0]=<0:6630977e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f211b28be3d:fbbff831d6e6195:c875399425aef52a:83fa63844829060e
          8c0000   L0 DVA[0]=<0:663097a6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f478e2f322b:fd16e82ab203975:aaad8d537b4cbf8a:c42edbcd52f9b334
          8e0000   L0 DVA[0]=<0:663097ce000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f5a2ef7d709:fda43cff146df76:11c4433042fb0e4b:a90b7a54c0ae7f8f
          900000   L0 DVA[0]=<0:663097f6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f19443a80cd:fc11962a2c3a706:472c4997eea41eed:a8699abbafcf8502
          920000   L0 DVA[0]=<0:6630981e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f6747c54313:fdae3340adf0f4b:942b4b8d43801d0c:930fc29bc15160fe
          940000   L0 DVA[0]=<0:66309846000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ef332ae31ee:fb36ef153a60ad8:d02f3435e34b3f54:3de8c244a9371b4
          960000   L0 DVA[0]=<0:6630986e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f104d3e7692:fba05646f128c83:19d28b569181bcd9:6da1c8adb9ae94bc
          980000   L0 DVA[0]=<0:66309896000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f4b1a5f6dc2:fdb04229679e29d:427d05a65058e145:32653c89b0ee6c2b
          9a0000   L0 DVA[0]=<0:663098be000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ee583ef134f:fc1fff77127dda3:1fbcffb54bab2a3f:9b1504cda70ea4b
          9c0000   L0 DVA[0]=<0:663098e6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f0ebe098bc0:fcf18e6ab54b604:e27422f82603136a:9e80bec536608608
          9e0000   L0 DVA[0]=<0:6630990e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3eed3b20acbd:fb90d92f43f7a01:5f85931ee05f9728:10feb048bb67244e
          a00000   L0 DVA[0]=<0:66309936000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ec062eb4281:faea8c1681c44ed:529aada3123e600c:b00305af5f64d26
          a20000   L0 DVA[0]=<0:6630995e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3e45f89a14a1:f9358c1930e52be:4df1bb95c058246f:2eb9e25b7d01c7df
          a40000   L0 DVA[0]=<0:66309986000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3e9743b05a69:fadf81ec56e7c43:fb113d2ca31a5a31:eac263186d35a6f2
          a60000   L0 DVA[0]=<0:66309a4e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ed56a738346:fb861e670245674:4dd9ef99db4f92b:d53092e5625e83dc
          a80000   L0 DVA[0]=<0:663099ae000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3e6a105dc9b5:f9c9eed58e93f0d:92ee823d7b8bc5ba:167baffe0bbb8c53
          aa0000   L0 DVA[0]=<0:66309a26000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3eadf5347819:fb6d567c86c045e:71c40358de4ae5bf:add3e39aad5ee994
          ac0000   L0 DVA[0]=<0:663099d6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ebb097a4a17:fad42bc6a0d327d:f4d468fadaa1d40f:f96c950722f19541
          ae0000   L0 DVA[0]=<0:663099fe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3e96886e4af0:fbb9fedd5939b7a:3d68e5f520a6c66b:efc4f32177f432e4
          b00000   L0 DVA[0]=<0:66309a76000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3e844859ea5e:face3cf1ee08006:32d273984e5f4bcd:c1726eeaf6dee43f
          b20000   L0 DVA[0]=<0:66309a9e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ebf04f84dba:fb21030cee2cadd:63bf9d5cf4be6a8:9e01cde548390734
          b40000   L0 DVA[0]=<0:66309ac6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3e83afed99bf:f98aa7b9efa67d2:b87b03f27710e791:e4344d269bfb9fee
          b60000   L0 DVA[0]=<0:66309aee000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ed7d861934e:fc23e2f7f6df07b:9d9264836c708f32:77c491602dee75a7
          b80000   L0 DVA[0]=<0:66309b16000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ed2b2340f91:fb8d4480cd4926d:3ec00fcf37e56cb:aec0f6bdbe214cea
          ba0000   L0 DVA[0]=<0:66309b3e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ef878a76084:fc2666f29135e84:2fc913feee7641ea:5ac9cc5d06cbc34b
          bc0000   L0 DVA[0]=<0:66309b66000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f13c76a8a0c:fb69791a8fc964f:b7adb09fff8b0c1:3cc6dd2d19a4e24e
          be0000   L0 DVA[0]=<0:66309b8e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3edb563ab381:fb9a10357cb3b3a:d5b4599d4fbf1299:f9ef9eebebf8a72c
          c00000   L0 DVA[0]=<0:66309bb6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f156b1c5370:fc168daa6c92c69:57ab5edf4173ecb1:70aaf0c51f1bc933
          c20000   L0 DVA[0]=<0:66309bde000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f2ee678f72e:fcd08a2f2b502f1:c31667bb28c4caed:56d168b46a8501d6
          c40000   L0 DVA[0]=<0:66309c06000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f24fd13545a:fd47641bfef2c38:a6a9893ced9da437:c1331dc2ddd48459
          c60000   L0 DVA[0]=<0:66309c2e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f2901d6c35f:fc93b41bbd2e839:9be7e902740cfbb0:8a15339708c8a923
          c80000   L0 DVA[0]=<0:66309c56000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3ee55d1277b2:fbafdcae32d37a4:646949809ceea97e:f96e7c9b76120187
          ca0000   L0 DVA[0]=<0:66309c7e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f1e2cb0ab9f:fcc15f54cd6cc89:ff7af72cef3cfdfc:c146af1be73e36ee
          cc0000   L0 DVA[0]=<0:66309ca6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f257c882c06:fca974c739a925f:bbba5cd96aee540e:a6fad687ccebc559
          ce0000   L0 DVA[0]=<0:66309cce000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f7213518a64:fe04c3d2f797e04:dbf5f9e352b1c751:2b9c380333788c85
          d00000   L0 DVA[0]=<0:66309cf6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f222b5a0549:fca12cb6502de14:26d7771bf8499de:f16fd8aaf69922ee
          d20000   L0 DVA[0]=<0:66309d1e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f94bcd4d370:fdf6c087fe30b8d:9e340897707d8ffe:36c7af8c1aaae160
          d40000   L0 DVA[0]=<0:66309d46000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fafe06e41f4:fed0eb63ed33253:848549fe40022ac2:12c084a88d823660
          d60000   L0 DVA[0]=<0:66309d6e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fa1171e65fb:fea6390b34b1501:4d6f9b98651fb5f8:b28e9a23abe4cd37
          d80000   L0 DVA[0]=<0:66309d96000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f628464f2cd:fde4f64ade395a0:d31a84d61ee4de04:15fea4852ab3b249
          da0000   L0 DVA[0]=<0:66309dbe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4015867a4290:101a5d629fdc15ca:63a28f920a49aa63:6f2dee805d575dcc
          dc0000   L0 DVA[0]=<0:66309de6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fa0617f3314:fe2137959b268ee:4de28d03b813eba3:bef87aadc1201596
          de0000   L0 DVA[0]=<0:66309e0e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f764cc18b96:fe1ba0f8e322c96:a0d3073709fdb91f:656da5f1ee1fe22c
          e00000   L0 DVA[0]=<0:66309e36000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f6ead84c3dc:fd9a5e749e07d0e:c64bf14d2d0e5bc8:f00c5ea8dd473a99
          e20000   L0 DVA[0]=<0:66309e5e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4002ac225b1b:ffe544706ecd829:a5b271eb9940bdf5:781235589ecf081a
          e40000   L0 DVA[0]=<0:66309e86000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40420176b1f1:100e499733cd0175:d20561ac49c2ca24:96d9571198fd9dfa
          e60000   L0 DVA[0]=<0:66309eae000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fefcc2c5ec6:1007d1877e633d0d:e55a98d173b3d959:b4ccda2b98901610
          e80000   L0 DVA[0]=<0:66309ed6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4005bda20977:10077fbbba16fd13:6256546ed7cf127a:7f8889f453473b1a
          ea0000   L0 DVA[0]=<0:66309efe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3febf052542b:fed3cde17f7cef5:df92dd62737ba98b:21d72dc9b9025f5e
          ec0000   L0 DVA[0]=<0:66309f26000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4022786eb4c4:100b8a4ae0449124:fd065883941e3b57:28d1213eb9424571
          ee0000   L0 DVA[0]=<0:66309f4e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3feefc2f37f6:ffa0bae286c49eb:b6a5b40401a7e50d:b63e3cc9fa52a7ad
          f00000   L0 DVA[0]=<0:66309f76000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=402a00678a37:100feaa2a1e5b690:f1158688f33bc60e:e550077ed1eab435
          f20000   L0 DVA[0]=<0:66309f9e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fcd52db31ae:fec81b097159ea2:349cbe5a7206cb17:6a4cd2c7454d0a8c
          f40000   L0 DVA[0]=<0:66309fc6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fe3f23b6053:ff204a3635e335c:519c6dc5606d6a3c:91329173379114ac
          f60000   L0 DVA[0]=<0:66309fee000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40acef2ded16:1028c4f37ccd625f:23f0d307ddf9ecb0:f19095a68755c88a
          f80000   L0 DVA[0]=<0:6630a016000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=406bc7330a44:1011eb7ed056667d:163567aab5d5ac48:9cecd4a62bb9a1b8
          fa0000   L0 DVA[0]=<0:6630a03e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4077d7477bd0:1024414cf958b31c:9e61f475df73f124:3a3178dde0d5386
          fc0000   L0 DVA[0]=<0:6630a066000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40c7e973468a:10280eef8bba8a9f:7501475bcd441554:eb13a33d5021585c
          fe0000   L0 DVA[0]=<0:6630a08e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40b0fd0205dc:102efaaa7d453623:5802394dd36f20e:a4f959b6c49d9b66
         1000000  L1  DVA[0]=<0:65891b6a000:4000> DVA[1]=<0:b403209e000:4000> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/2000P birth=12305637L/12305637P fill=128 cksum=21740feb756:b3e45a541cd14:20a09b20af400123:1c505a3fb09214fc
         1000000   L0 DVA[0]=<0:6630a0b6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40bad8bd8492:10364dc6bf43ae7c:752bbec5ee605877:b9140556fe5bf321
         1020000   L0 DVA[0]=<0:6630a0de000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=414854544bce:1050e3cdac9f542c:a62bc5a53a5bc27c:595458b7bc05e5be
         1040000   L0 DVA[0]=<0:6630a106000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=409892d29488:10210ec02675724d:3a4c5b10452e77b4:a9da83b0afc1a18a
         1060000   L0 DVA[0]=<0:6630a12e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4129a5511e5c:1051bc1ce87640a4:f2c526fb0d0ff76c:faeb7f9c7016d157
         1080000   L0 DVA[0]=<0:6630a156000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40b578212699:1024485db759323c:11ce7ea41e64c36a:4513720ef5ad33a7
         10a0000   L0 DVA[0]=<0:6630a17e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40c051981bc2:10361674a078ed19:7f97f2bd48fb1432:bb080b2f1043459f
         10c0000   L0 DVA[0]=<0:6630a1a6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4087cbbd3cb9:103305c6bd8ab9f6:ef7fad5944d3746b:5719776ac4bdaa80
         10e0000   L0 DVA[0]=<0:6630a1ce000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40681e3ec264:101d10e674d619cc:d295839a485458de:2161dc76f85e852c
         1100000   L0 DVA[0]=<0:6630a1f6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=409c1fd16397:102ce0b4fe8b3474:c6279c9d47118cf0:860b1aadd45a35a2
         1120000   L0 DVA[0]=<0:6630a21e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4068b1c0cbf6:101ca3278cc51d5f:4cdfc7777dd970d6:3fbd1e3fcd3a2b0f
         1140000   L0 DVA[0]=<0:6630a246000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4062a0588f3b:1015af05dd5296ef:edddab0ced2bd819:9dbcf2d56a68dc99
         1160000   L0 DVA[0]=<0:6630a26e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40265f22eec9:10078b6057b2ac24:9c1a27badb0c5d80:9ac163d36c3aed53
         1180000   L0 DVA[0]=<0:6630a296000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40ce2c38f562:103531d9a4ea98ad:2d44fda4de9d0fe9:62011642e3bffbb9
         11a0000   L0 DVA[0]=<0:6630a2be000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40b4f8715b93:10335c995d06fccc:6409417b07042345:cdc164c5cb8f4511
         11c0000   L0 DVA[0]=<0:6630a2e6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40f89ca622bc:103c9747946fcb49:29225d1142ca3ee8:bf8414d2098d62c1
         11e0000   L0 DVA[0]=<0:6630a30e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4148d4f16878:1045e96f0d9ae2bd:a48c841b2cc37c86:32b404bbf2e6f304
         1200000   L0 DVA[0]=<0:6630a336000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40f6fcd2cd6c:10325fa0853b6280:9994f8297a8de175:a19d5384e1b4b056
         1220000   L0 DVA[0]=<0:6630a35e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=410db4308223:1046e0dd68cd3d73:22c005b30a4c4a39:714cb4e054e960f6
         1240000   L0 DVA[0]=<0:6630a386000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40af7652bf0e:10309b8bf11e03c4:5a6d6ba73380e9f5:10f1fdbe1bebf020
         1260000   L0 DVA[0]=<0:6630a3ae000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=411897319ab4:10458e584658e3ae:25e0eb3632d76204:5b3ce7f343d9963a
         1280000   L0 DVA[0]=<0:6630a3d6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40f23b64be8d:1038c52bf261c3e0:a9c68706bc8864b:2e20f50633a1cef0
         12a0000   L0 DVA[0]=<0:6630a3fe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=414e14e031d7:10514de16bcc1995:a37e5c9ee998ae0c:74b01d139cc45a98
         12c0000   L0 DVA[0]=<0:6630a426000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40d93bff6b6c:1045cce1fed1a370:7376a70d3217fd12:f225a60f71c2c1b2
         12e0000   L0 DVA[0]=<0:6630a44e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40f000de811f:1048e968fc4aae3a:dd15f30d2948963e:dc4e882cf96c8636
         1300000   L0 DVA[0]=<0:6630a476000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40b33f218cc2:1026081da06fe243:59e6ed87f16d0f6:4cb07db20c81ccc
         1320000   L0 DVA[0]=<0:6630a49e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40b896ab2da3:10317c8447299f29:2debce9b7eadfaf6:a163d8151f8537df
         1340000   L0 DVA[0]=<0:6630a4c6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=410dac67c37f:10344242d48a060c:3d969a2a7a4b0a89:bd7f6e308db4e789
         1360000   L0 DVA[0]=<0:6630a4ee000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40d96aa4ae70:1030a09956362ff0:60a7e66fc17ef328:e9367a1dabbb7a9d
         1380000   L0 DVA[0]=<0:6630a516000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=409342c075b4:101bf887ba01ae33:a9a1fbe75478fa6:8c7bfe8458ccf2b3
         13a0000   L0 DVA[0]=<0:6630a53e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40d4bfcd4375:1033c17c14accab5:875924d60b68940b:e211bd74c86a5020
         13c0000   L0 DVA[0]=<0:6630a566000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40d62871d35b:10362c96a03b0e26:6c0e8bec348bf77a:a5abcbb63106d05a
         13e0000   L0 DVA[0]=<0:6630a58e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40a18192709c:102399f99811277b:1ca530e2ba6731b6:bbcb2e57450c874b
         1400000   L0 DVA[0]=<0:6630a5b6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40e4b1246148:1028a777945be91d:622d88c9e967870d:4114f9810697d340
         1420000   L0 DVA[0]=<0:6630a5de000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=410d7bce1894:1044a8c4daacc993:83b0ff17d75c224b:55cf188128b51254
         1440000   L0 DVA[0]=<0:6630a606000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40dc41abb88f:103c5a6702de53a2:c00013bf1d6286f5:b5544586277a5dbf
         1460000   L0 DVA[0]=<0:6630a62e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=413483a8b5f9:104a35f6934ea912:63095345a2359179:f1c0d3f4d10583d
         1480000   L0 DVA[0]=<0:6630a656000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=413e121fee42:105a13a9054ab083:67cc1f4bd1f80655:f829a80d018a715f
         14a0000   L0 DVA[0]=<0:6630a67e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=413d84396268:104b5911a303f49d:b914e7f387b41403:8dad935df84b1764
         14c0000   L0 DVA[0]=<0:6630a6a6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4186d35040ab:10605a5a99a0bb2e:cde0d9c9eb794fb3:86b2789bf8f33ec2
         14e0000   L0 DVA[0]=<0:6630a6ce000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=417a33bacdc9:105455fbfd364598:6f8e269528641df5:b7f9e0151b0bee96
         1500000   L0 DVA[0]=<0:6630a6f6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=416737a99317:1057748e93a568e1:a5484cf0b634a237:f20e9606878c5921
         1520000   L0 DVA[0]=<0:6630a71e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=420520358821:107f1cb8d12813ce:7ef7581c22e5cdf:31d2116afa7ec0d7
         1540000   L0 DVA[0]=<0:6630a746000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41eab938ff59:1076980fb7d34a0a:6a7c959787cc96fa:dc1b1f6e8b3334e3
         1560000   L0 DVA[0]=<0:6630a76e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=418f3d4393cc:1060b96b2d2a4d5b:95d5ec8d4792818c:37540e3a5beb58d1
         1580000   L0 DVA[0]=<0:6630a796000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=414ac4858fa5:103fd06283d7ffd8:93e35bd1a75e4c28:a52ac5bedfc78b7
         15a0000   L0 DVA[0]=<0:6630a7be000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41aee607621f:1078106183221ede:ce1679759e7e523f:77891eb2ccc5d74e
         15c0000   L0 DVA[0]=<0:6630a7e6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=42126df68ce8:108369fdec9b7f1d:b42b3d7039a5c3e6:40e527b86b63814
         15e0000   L0 DVA[0]=<0:6630a80e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41e135f47a31:107d33a90c7d98bf:de5895afe7c7f24f:3279651762cdd648
         1600000   L0 DVA[0]=<0:6630a836000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=42041773e3e8:108178624483284c:1d5513270c99a785:c125d8c6060e84e1
         1620000   L0 DVA[0]=<0:6630a85e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4196dc3e5746:106d177723a63b18:841cba512524c905:1376b3247999abe2
         1640000   L0 DVA[0]=<0:6630a886000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41f12c28f3c6:1085567da2018e54:60696bbe961a7a60:4b2f98199f8e2d
         1660000   L0 DVA[0]=<0:6630a8ae000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41957f612575:105cc4445293bc05:6d1f4ca3a6562e23:c0723ba93d79cadc
         1680000   L0 DVA[0]=<0:6630a8d6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41db33bd33f4:10794a29d4d82d41:83968a08c3e65446:cf817e17d5164161
         16a0000   L0 DVA[0]=<0:6630a8fe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4214bfe901c0:107ea2407bed6d1f:c07c6fb41860dbfd:5e98e7491cb467ce
         16c0000   L0 DVA[0]=<0:6630a926000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=418fec480522:10674f8a28ad5299:a78bfb519b64e021:f5bfd3ef21389de8
         16e0000   L0 DVA[0]=<0:6630a94e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4204d4e1d0b2:107b009b42e861bd:d4d7f44a0dbe5075:fec70e065de860d6
         1700000   L0 DVA[0]=<0:6630a976000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41e082ac23e7:107cc5849ed9a550:a8cedb894be8733b:9ceac4c28633c8ce
         1720000   L0 DVA[0]=<0:6630a99e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41bfdbbd6d6b:1066b2bef6d809d1:882344b450e92a28:ef33e7ff34713d27
         1740000   L0 DVA[0]=<0:6630a9c6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=419c51fd1da3:106866ec3c802532:7ce3fddd3a21c007:60f76b53dc9af874
         1760000   L0 DVA[0]=<0:6630a9ee000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4181829c5a8d:1065f8c638d21310:6681610627a250d:946cabf8111fad6b
         1780000   L0 DVA[0]=<0:6630aa16000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4178a29541a5:1063724f25888df7:91881d2b85608be1:e3a8b32297653228
         17a0000   L0 DVA[0]=<0:6630aa3e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=413893531210:10545d5d104bf216:13601cfdafbda1e7:3ab6c0e01bb017a8
         17c0000   L0 DVA[0]=<0:6630aa66000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4128e747f51e:104a6d64718c78e9:fcf23bdc676087ab:a5889d61bb497bc6
         17e0000   L0 DVA[0]=<0:6630aa8e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40d51098f559:102ed469355b13ab:ccb12acfe400062e:d8f39deb6b47f1ad
         1800000   L0 DVA[0]=<0:6630aab6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41211864c27f:10431ecca9363d9b:ab8a8864f2d8cbeb:43cbf8a783b23d07
         1820000   L0 DVA[0]=<0:6630aade000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=405014db8276:100f9675b661346e:f7b8ad4bcd9d2322:8f226630db995f82
         1840000   L0 DVA[0]=<0:6630ab06000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=406fa45689a2:1018bdaa20cea7b9:1ca8af10d2fe3640:52fe436ef1872273
         1860000   L0 DVA[0]=<0:6630ab2e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=408e60095360:101f4b42f5c8effb:ef29140b13bf84c1:f2a8690253372bb9
         1880000   L0 DVA[0]=<0:6630ab56000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=407547fc6cde:101e6058f5f0f32a:51613b7db211bfa4:8ffbe0e8b4cfe32
         18a0000   L0 DVA[0]=<0:6630ab7e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=409fde7675e3:102e38ee081f471f:e2b904a8a5999105:278a9b77150c1bd0
         18c0000   L0 DVA[0]=<0:6630aba6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=403321609ac0:10099199c1f21e0c:20dac6dd2bc4441c:a241bf4e22905af6
         18e0000   L0 DVA[0]=<0:6630abce000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4076b0288ce5:1015a669a78c791c:3c8e90d994dc638a:61cc1823b836d70
         1900000   L0 DVA[0]=<0:6630abf6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40b58c910e89:10350388d47dcade:a8b3e3e3e068db05:ef94a125410a7650
         1920000   L0 DVA[0]=<0:6630ac1e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=402568f4dcc2:1005b7f6c3fc7381:60698ffde2e2d18:d04d760278bff4a1
         1940000   L0 DVA[0]=<0:6630ac46000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40835e5ab51e:101fa8f25d31b822:4fad6592c481c53:a3c594861427825c
         1960000   L0 DVA[0]=<0:6630ac6e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=409d96e2a43c:1026224f5525fa2f:1a20c18fe0cf8c7f:95f6ed4c328600b2
         1980000   L0 DVA[0]=<0:6630ac96000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40bb10ad6879:1033cf784ddc9cef:5cb3abcfaafd56e1:94784896fd053a5e
         19a0000   L0 DVA[0]=<0:6630acbe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4074bbc975f8:1020f1f42c83f57e:2def630874262b4e:2bf917e401d2d0ab
         19c0000   L0 DVA[0]=<0:6630ace6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=407df934958b:1022c9890b711e3a:f35461858033a8dc:73fe0ff5767d8328
         19e0000   L0 DVA[0]=<0:6630ad0e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=408ee614693e:1021a642dce674e0:d276bddd5a74522f:3aa8c54f1e4c755d
         1a00000   L0 DVA[0]=<0:6630ad36000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40bcbd4c25d3:10274e6d2f05be69:ba714ce212406315:d877a0d8418dc6da
         1a20000   L0 DVA[0]=<0:6630ad5e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41583e29256a:10580c9bb8bd4eee:6b5c848cb8f5b4b8:ab4d220e69db382b
         1a40000   L0 DVA[0]=<0:6630ad86000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=416ba2ba2e8a:1057e2c1751ac05c:cde7003a47606644:1303679b3b02d857
         1a60000   L0 DVA[0]=<0:6630adae000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40fa4d33c43a:10412a99cfcb1bd8:f66bbb03ae68ec43:3b39ae21898a8e97
         1a80000   L0 DVA[0]=<0:6630add6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=411c839168bb:104d8b1ab739dbc9:952112ba11c32cee:40847dfa842f894a
         1aa0000   L0 DVA[0]=<0:6630adfe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4157ab5c1476:10568c9ea88f4889:fcc7b4952048bdb6:26c16a971a59b49b
         1ac0000   L0 DVA[0]=<0:6630ae26000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=412dd77eeae0:103d49f8f82e92c6:a602dce4f07cefb4:311e16b6db6c18b5
         1ae0000   L0 DVA[0]=<0:6630ae4e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4105d3e7bbcf:105113d0db15f911:dc71b0e43612f075:64aea8d8821479b7
         1b00000   L0 DVA[0]=<0:6630ae76000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4120fc44efe6:1058110918c91620:28c9671d051bd690:c7d477bb5f06a7c3
         1b20000   L0 DVA[0]=<0:6630ae9e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4122ce40f33e:105283b9332b75ab:e04a82247df47a4a:db645ba5e5199f5a
         1b40000   L0 DVA[0]=<0:6630aec6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40d56f914a34:103cbb3d66ab01dc:f71a4722fd37efd1:62efdf108c4a139b
         1b60000   L0 DVA[0]=<0:6630aeee000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4133294ac367:1051d0433c8555d7:ccb77191b139be11:7aa4c71e896c27c8
         1b80000   L0 DVA[0]=<0:6630af16000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=415624016c02:104e5f862323ce85:e42368373ee00ee5:f3779c7693055e9c
         1ba0000   L0 DVA[0]=<0:6630af3e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4135e8808b73:1049a502d52983c9:df6d793337f6c4f3:659ff37f908140f2
         1bc0000   L0 DVA[0]=<0:6630af66000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40fe0510b0d1:1049d5d234b6360d:d4928929919190b0:5bfdea93cfebddd6
         1be0000   L0 DVA[0]=<0:6630af8e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4125549d9f39:103541ca6c6635ca:da149269c0af2d05:b63d0aadae16379
         1c00000   L0 DVA[0]=<0:6630afb6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=410996dad93a:104bb345970336f9:d04b907678e8de75:a6fa9d47cc36c210
         1c20000   L0 DVA[0]=<0:6630afde000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4129bf8e5d87:10487a994815b3ab:bdd13de7cb5ca09e:3bfae70afb773d2c
         1c40000   L0 DVA[0]=<0:6630b006000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41676c60c63a:105a82a5e410b590:1123a869b6ea9461:74cb5f0dfe96975b
         1c60000   L0 DVA[0]=<0:6630b02e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41387c97cced:1059936fea229195:20a761dd180c4d6:dbd879a5b4df01e1
         1c80000   L0 DVA[0]=<0:6630b056000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=410b1ab81564:1048cebebe4fbc03:e684ec5e7761d979:3f7e79492af3444d
         1ca0000   L0 DVA[0]=<0:6630b07e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41068766dc36:104b6e754efd0234:9479b9dac7bef4f4:3866c712edc2af4a
         1cc0000   L0 DVA[0]=<0:6630b0a6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=413b67039eb4:104c75250591acb6:b4c890bd3bb3958c:7fd6e4379886ec58
         1ce0000   L0 DVA[0]=<0:6630b0ce000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40f356bf92cb:10335233abc3f8e4:2af33f8912c423fe:8e7d4b561f4e7675
         1d00000   L0 DVA[0]=<0:6630b0f6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40da7bfce663:103c32b361221d49:227bd46139d99214:dea72e9f34ce0fb3
         1d20000   L0 DVA[0]=<0:6630b11e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4121d5aa576d:1044bf8687299716:b252c907c57f8930:83b66cd472d21d4d
         1d40000   L0 DVA[0]=<0:6630b146000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41578de89552:105c02e7d48e82f8:d0120d480dabe09f:6d441ef1580c5138
         1d60000   L0 DVA[0]=<0:6630b16e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=414f1f409701:1048e462559c171c:53a2a6ee5dbd329a:98d6494dda21ef35
         1d80000   L0 DVA[0]=<0:6630b196000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4190c12b382e:1060585e4959b179:2099e7f1b349338a:53dadc82227d7064
         1da0000   L0 DVA[0]=<0:6630b1be000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=418fdc06e330:10620007ba42bed4:d80d6b95632b1393:4732d9aa1ed558bb
         1dc0000   L0 DVA[0]=<0:6630b1e6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41d7233ef544:106b212392fd641b:c1c9d3dcf548ffe:8bfddfe91e205a2f
         1de0000   L0 DVA[0]=<0:6630b20e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=419c428e057e:106930803315c3bc:4a4ad8eb07276661:2a187b029d31bd7f
         1e00000   L0 DVA[0]=<0:6630b236000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4224c97ebf22:1091dd199b03a973:fe7b983de369935b:4a3d3af6a654631e
         1e20000   L0 DVA[0]=<0:6630b25e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=415a87e1ed79:105b1c4c0f228a5d:4ce92746eed3c5ec:9c9cacf6ec94517c
         1e40000   L0 DVA[0]=<0:6630b286000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41ee0433d817:10796c62da3cf58e:976494b40942ca57:93cae699c44aaf9d
         1e60000   L0 DVA[0]=<0:6630b2ae000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4152a06a684f:1064da90fc4c392d:fe334c7b9e8ebd6f:f02e1865b3875fd8
         1e80000   L0 DVA[0]=<0:6630b2d6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=411e905bf8b9:103eac7672a3a185:819bf3710f0ab6f0:645d595f8adaabc1
         1ea0000   L0 DVA[0]=<0:6630b2fe000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4161eeae6791:1054ade073cc5bce:5b013be6d64921be:ad93a0f958dac97e
         1ec0000   L0 DVA[0]=<0:6630b326000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4146018f2119:105375aecfb48c13:a2f7bbf00469db3:acf8854d8bb7054c
         1ee0000   L0 DVA[0]=<0:6630b34e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=413119ff989b:104ac686d9e40efe:dab8f01f57294e8a:8319d75739a0dfb0
         1f00000   L0 DVA[0]=<0:6630b376000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41a3af066139:106a923dc0eb59a0:8badff90165fbfe4:b59d736f947daf6c
         1f20000   L0 DVA[0]=<0:6630b39e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=413216b9c830:104f2ee6554c6078:325cbaa1a3c04a1a:335de3ed1874dd26
         1f40000   L0 DVA[0]=<0:6630b3c6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=417c1f61dfaa:105d190a83edd856:50a2e5aa9bfe89c3:f7ec3aeb139e1750
         1f60000   L0 DVA[0]=<0:6630b3ee000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40ca753ef8b1:103757581b258140:2f69aafa6ed853b3:824b5630c54d3c49
         1f80000   L0 DVA[0]=<0:6630b416000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40d912a18922:103a5c141a2884f4:2b735e53b29fa65d:f05476931911628b
         1fa0000   L0 DVA[0]=<0:6630b43e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40ff1b4071ed:1039795a1e080364:adde9418c74efe46:64b6142ae2f362b7
         1fc0000   L0 DVA[0]=<0:6630b466000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40da3a26b4d0:103d90d0c316411b:be119b2e49a3527a:fd7f98da9963c16a
         1fe0000   L0 DVA[0]=<0:6630b48e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=407a2df93df9:1024cb28631ef8ac:909ae2b4a04e84e:a35b207512b1d3aa
         2000000  L1  DVA[0]=<0:6630b680000:2000> DVA[1]=<0:b40320a2000:2000> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/1000P birth=12305637L/12305637P fill=12 cksum=48b8e6e881:10a2716321758:1e8b15c139c567b:583bf803dfe0702d
         2000000   L0 DVA[0]=<0:6630b4b6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fedf41740aa:1006102ac0684a41:e2908b7396d6d9a1:186704c353106be
         2020000   L0 DVA[0]=<0:6630b4de000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3fdc6e52726c:fefd3bfd14cd0bf:36675f951a94cac6:6fafacd423230d52
         2040000   L0 DVA[0]=<0:6630b506000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=3f9a3c7e1d07:fdf6e9ce6ac707f:b936a5d37f779041:b79e8943ba7ca7d7
         2060000   L0 DVA[0]=<0:6630b52e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=406dfa5301e5:10178164a5a53de6:c19f0926aa43c850:1c13a71f8d8cba2c
         2080000   L0 DVA[0]=<0:6630b556000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=406d8f4ef1a8:1018af0f83de041f:99d1e802e7628d17:dbcf287baafd8ac3
         20a0000   L0 DVA[0]=<0:6630b57e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40e3eb81ef04:1039668d6235bcc4:8add7657158e89a7:cb448f54e0367c5f
         20c0000   L0 DVA[0]=<0:6630b5a6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40f0464b3522:103a2b09ae0b07f1:952b779c8760060f:9f9377879faad430
         20e0000   L0 DVA[0]=<0:6630b5ce000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=40cebcecda4e:1030f9de80916198:cf9fb32994278e74:e0d2fe1fa16a6991
         2100000   L0 DVA[0]=<0:6630b5f6000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=410ca8754cc2:103f05fb72d13eb1:b557b3973cfdde3f:96ad6ae35f10e444
         2120000   L0 DVA[0]=<0:6630b61e000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=4133045462bc:10494a2b33e0e790:8fa1d7622428356e:bffab0bb0d97861e
         2140000   L0 DVA[0]=<0:6630b646000:28000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=12305637L/12305637P fill=1 cksum=41bbaf515418:1065d22ea9c74a39:b9e0ff2bd87b8272:e6aeaf7b3d8bce3f
         2160000   L0 DVA[0]=<0:6630b66e000:12000> [L0 ZFS plain file] fletcher4 lz4 LE contiguous unique single size=20000L/d000P birth=12305637L/12305637P fill=1 cksum=18e0647db0ec:2b36a664174a882:4c203739aaf44b5:778cf186121e1cf3

        segment [0000000000000000, 0000000002180000) size 33.5M
JuliaVixen commented 7 years ago

Ok, so I went to the store, and I bought five, brand new, 10Tb hard drives, so I could have some space to completely rebuild my zpools... Then I bought an entirely new computer, because I keep getting random checksum errors, and maybe, just maybe, it's a problem with a SATA controller, or the PCIe buss, or something.

It's a Supermicro X8DTN+, with 128G of ECC memory, two Intel Xeon E5645, and a LSI SAS2008 Fusion-MPT SAS/SATA controller.

I created the pool on my old computer, and then started a zfs send|recv, which transferred, I think, at least over half a terabyte, before mysteriously stopping without an error. So I restarted it, and got those filesystems copied over. Then I moved the five 10Tb hard drives over to my new computer, and zfs send|recv another 30Tb of data.

So, that all worked, and everything seems fine, no errors anywhere, from anything, about anything. And last night I ran the first zfs scrub on the pool, and now I see this:

  pool: T
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub in progress since Tue Sep 20 03:34:57 2016
    21.4T scanned out of 40.7T at 527M/s, 10h38m to go
    0 repaired, 52.68% done
config:

    NAME                                   STATE     READ WRITE CKSUM
    T                                      ONLINE       0     0     1
      raidz1-0                             ONLINE       0     0     2
        ata-ST10000VN0004-1ZD101_ZA206BG3  ONLINE       0     0     0
        ata-ST10000VN0004-1ZD101_ZA207RAJ  ONLINE       0     0     0
        ata-ST10000VN0004-1ZD101_ZA207RQE  ONLINE       0     0     0
        ata-ST10000VN0004-1ZD101_ZA208464  ONLINE       0     0     0
        ata-ST10000VN0004-1ZD101_ZA20858Z  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        T/other_l/photos@snapshotjustincase:/2006-Dec-16/_dsc9900.nef

That filesystem, T/other_l/photos was one of the ones I just zfs send|recv'd on my brand new computer.

Seriously, I've replaced ALL of my hardware at this point, this has to be a software bug!

Smartctl, and edac, and dmesg, and everything say that my brand-new hardware is working perfectly. I built the Linux kernel to do a bunch of extra hardware checking, and self-checking debugging stuff, and there has not been a single error message about anything.

This was how I created the pool:

zpool create -O atime=off -O compression=lz4 -O exec=off -O devices=off \
 -O recordsize=1M -O setuid=off -O checksum=sha256 \
 -o ashift=12 -o feature@lz4_compress=enabled -o feature@embedded_data=enabled \
 -f T \
 raidz1 \
/dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA206BG3 \
/dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA207RAJ \
/dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA207RQE \
/dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA208464 \
/dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA20858Z

I was transferring the filesystems like this: zfs send -eLv -R l@2016_Sep_12 | zfs recv -evsF T [Got interrupted] zfs send -eLv -I l/photos@Nov17_2013 l/photos@2016_Sep_12 | zfs recv -evsF T/l [Got interrupted] zfs recv -A T/l/photos zfs send -eLv -I l/photos@Feb12_15 l/photos@2016_Sep_12 | zfs recv -evsF T/l Some history:

2016-09-14.02:21:42 zfs recv -evsF T/l
2016-09-14.07:39:12 zfs recv -evsF T/l
2016-09-14.21:38:27 zpool export T
2016-09-15.01:43:46 zfs recv -A T/l/photos
2016-09-15.05:49:19 zfs recv -evsF T/l
2016-09-15.13:43:48 zfs create T/other_l
2016-09-15.13:44:07 zfs rename T/l/photos T/other_l/photos

The pool is currently 98% full if that matters.

rincebrain commented 7 years ago

You might have better luck debugging this in a more immediate response environment, like IRC.

Also, I would have strongly suggested using a version of ZoL with the ignore_hole_birth tunable before doing a zfs send from a pool with hole_birth enabled/active. (See things like #4809 for problems which can arise, though yours doesn't specifically look like one of those.)

Aside from that, I'm presuming you're running the same Linux distro on both machines? My two remarks would be to try clearing and scrubbing again on the pool on the new system (to see if it was a one-time error created by having the pool attached to the old machine), and to try running a different distro of Linux on the new machine (to be sure it's not some strange artifact of that distro).

I don't think anyone's arguing about whether or not it's a software flaw, just trying to figure out why this is happening to you in particular and nobody else (that has been seen).

How large is just T/other_l/photos@snapshotjustincase? (Just wondering b/c it might be a useful experiment to see whether just receiving the send stream of that snapshot is sufficient to reproduce this bug on any pool, which would also make it much easier to debug, presuming you would be comfortable trusting someone debugging it to have a copy of that dataset at that snapshot temporarily).

JuliaVixen commented 7 years ago

I already plugged the disks into a different machine running FreeBSD 11.0 RC3. The checksum error occurs there too. It's something actually written on the disk; it's not anything happening when read.

So, this situation is getting desperate for me, and don't want to keep buying more hardware. I now have datasets which I have ZERO copies of without checksum errors. Two out of three of my backups of ALL the rest of my data have checksum errors on the raidz vdev. Rendering a single, generally randomly chosen, file inaccessible. I generally try to keep AT LEAST three backups of all my data, and I'm now down to only one good set left, and a mountain of corrupt pools which I don't dare to wipe yet until I can get at least one more good backup copy.

So, I have Pool_A, Pool_B, and Pool_C, which are all the same data. Pool_B and Pool_C are the backup copies I store, offline, in a physically separate location from Pool_A.

Pool_A, suddenly develops a checksum error in File_X I check Pool_B, there's a checksum error in File_Y Ok, well Pool_C is still ok for now, but I don't want to write it back onto the drives from the other pools in case of some kind of catastrophic disaster. The other pools are 99.44% ok, and I between the two, I still have at least one copy of all my files, but I can't do a zfs send from any of the corrupt pools, because, zfs...

And, now I have a Pool_C with a checksum error in File_Z, so I'm going to have to create an entirely new pool from scratch and just rsync the files the old fashioned way. Which is going to be a pain in the ass if I have a bunch of snapshots I'm keeping around for some reason. (Mostly the reason is that I haven't invested the time into checking if I actually do need anything in the snapshots.)

So, I just went to the store, and purchased another US$1200 of hard drives, and attempting to create a new pool on it with every possible feature turned off... -o feature@hole_birth=disabled I keep getting error messages like this: cannot create 'new_test_pool': property 'feature@hole_birth' can only be set to 'enabled'

Anyway, same linux Distro, kernel version, actual kernel binary, since I just copied the whole system drive over to the new computer. ZFS 0.7.0-rc1 from Sep 8, 2016

Once a pool has one of these checksum errors, I can only zfs send data until it hits that error, and aborts with an error message. I'd really like to just be able to extract the "file" where the corruption supposedly is, and check if the contents of the file itself are ok, or corrupt. (I have backups and MD5 hashes of everything, so it's easy for me to tell.) zdb doesn't seem to have any way to do this, other than trying to read the raw blocks one at a time, and glueing them all back together.

rincebrain commented 7 years ago

Okay, so you have three pools, pool_{A,B,C}. A has corruption in X, B has corruption in file Y, C has corruption in file Z, right?

What're the respective layouts of all three pools? At least two of them (if not all three) seem to be single raid-z1 vdevs.

My suggestion to you, for the moment, would be to put together a concise document of what the three pools look like (zpool get all, zpool status -v), what the {motherboard,processor,distro,SATA controller, ZoL version} are of the machines you have. Keep at least one of the three pools offline like you have.

This is obviously a serious issue, but for some reason, nobody else has yet encountered it. So something, hardware or software, is behaving very unexpectedly, but in a way nobody else has hit.

What I would really, really like, would be if you could take one of the three pools, nuke it, and on some machine running not Linux, please create the pool anew and rsync data over from another of them, all the while with those disks never being used from that same Linux install. (I understand that once it happens, the data is mangled on-disk and will produce the same behavior on any OS with ZFS support, but I want to understand whether this is a ZoL-specific problem with the initial data corruption or something more confusing.) Please do not use zfs send|recv, even initially, because I really want to know if this can happen when only the data is in common between them, not any ZFS metadata.

Yes, as far as I know, ZFS doesn't make it trivial for you to work around it responding with EIO if a checksum failure happens on data IO. You could, as I think you alluded to, follow the instructions over at http://cuddletech.com/?p=407 with some glue code to parse zdb output, but that'd get messy somewhat fast.

This is quite strange, because it happening three out of three times suggests it's not an uncommon issue, but you don't have hardware in common at this point.

Oh, and what might also be extremely useful is the zdb -bbbb output for each of the mangled files, so we can try to understand what they might have in common.

JuliaVixen commented 7 years ago

So, I've been making backups.... Since I was backing up from A to B, and then from B to C, and then from C to D, I decided to check the MD5 hashes I'd generated back in June against the files, just incase anything may have become corrupt, and not been caught by ZFS. Everything checked out OK, except one file...

localhost qnap # md5sum -c ../qnap.md5 &> ../qnap_check.log

localhost qnap # grep -v OK  ../qnap_check.log 
md5sum: ./STUFF.ISO: Input/output error
./STUFF.ISO: FAILED open or read

localhost qnap # ls -l "./STUFF.ISO"
-r--r--r-- 1 root root 36929863680 Apr  5  2015 ./STUFF.ISO

localhost qnap # md5sum "./STUFF.ISO"
md5sum: ./STUFF.ISO: Input/output error

localhost qnap # md5sum "./STUFF.ISO"
md5sum: ./STUFF.ISO: Input/output error

localhost qnap # dd of=/dev/null if="./STUFF.ISO"
dd: reading './STUFF.ISO': Input/output error
53581824+0 records in
53581824+0 records out
27433893888 bytes (27 GB) copied, 1221.83 s, 22.5 MB/s

So, that's kinda weird, because I can keep attempting to read this file, over and over again, but zpool status reports nothing wrong with the pool.

localhost DVD # zpool status -v DVD
  pool: DVD
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: none requested
config:

    NAME                                STATE     READ WRITE CKSUM
    DVD                                 ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401EL3  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401FAW  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401FP0  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401PH7  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z84059L3  ONLINE       0     0     0

errors: No known data errors

This is usually the situation in which I would see a checksum error on the top level of the raidz... except, this time, this pool isn't a raidz, it's just a bunch of disks... and there's no checksum error anywhere...

This is the entire history of the pool, it's only two days old, and it's mounted read-write.

localhost DVD # zpool history DVD
History for 'DVD':
2016-09-27.05:11:49 zpool create -O atime=off -O compression=lz4 -O exec=off -O devices=off -O recordsize=1M -O setuid=off -O checksum=sha256 -o ashift=12 -o feature@large_blocks=enabled -o feature@lz4_compress=enabled -o feature@embedded_data=enabled -d DVD /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z8401EL3 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z8401FAW /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z8401FP0 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z8401PH7 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z84059L3
2016-09-28.10:01:45 zfs recv -evF DVD
# zpool get all DVD
NAME  PROPERTY                    VALUE                       SOURCE
DVD   size                        36.2T                       -
DVD   capacity                    95%                         -
DVD   altroot                     -                           default
DVD   health                      ONLINE                      -
DVD   guid                        8669138326454543505         -
DVD   version                     -                           default
DVD   bootfs                      -                           default
DVD   delegation                  on                          default
DVD   autoreplace                 off                         default
DVD   cachefile                   -                           default
DVD   failmode                    wait                        default
DVD   listsnapshots               off                         default
DVD   autoexpand                  off                         default
DVD   dedupditto                  0                           default
DVD   dedupratio                  1.00x                       -
DVD   free                        1.47T                       -
DVD   allocated                   34.8T                       -
DVD   readonly                    off                         -
DVD   ashift                      12                          local
DVD   comment                     -                           default
DVD   expandsize                  -                           -
DVD   freeing                     0                           -
DVD   fragmentation               -                           -
DVD   leaked                      0                           -
DVD   feature@async_destroy       disabled                    local
DVD   feature@empty_bpobj         disabled                    local
DVD   feature@lz4_compress        active                      local
DVD   feature@spacemap_histogram  disabled                    local
DVD   feature@enabled_txg         disabled                    local
DVD   feature@hole_birth          disabled                    local
DVD   feature@extensible_dataset  active                      local
DVD   feature@embedded_data       active                      local
DVD   feature@bookmarks           disabled                    local
DVD   feature@filesystem_limits   disabled                    local
DVD   feature@large_blocks        active                      local
DVD   feature@large_dnode         disabled                    local

I did this:

zpool create \
 -O atime=off \
 -O compression=lz4 \
 -O exec=off \
 -O devices=off \
 -O recordsize=1M \
 -O setuid=off \
 -O checksum=sha256 \
 -o ashift=12 \
 -o feature@large_blocks=enabled \
 -o feature@lz4_compress=enabled \
 -o feature@embedded_data=enabled \
 -d \
 DVD \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z8401EL3 \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z8401FAW \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z8401FP0 \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z8401PH7 \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z84059L3

And then this, and that's all...

zfs send -eLv -R Y/Z@Backup_to_Y_Sep20 | \
zfs recv -evF DVD

There is nothing in dmesg or any of the syslogs.

rincebrain commented 7 years ago

Same question as before, I suppose - what's zdb -bbbbb DVD $(stat -c '%i' [path to file]) say?

(Or, if you want less magic in your command, use stat to get the inode from the file STUFF.ISO, and then zdb -bbbbb DVD [that number])

JuliaVixen commented 7 years ago
localhost ~ # /sbin/zdb -bbbbb -e DVD 186
Dataset mos [META], ID 0, cr_txg 4, 63.5M, 1518 objects

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       186    1    16K    512      0     512    512  100.00  DSL dir clones

Anyway, so I did this some more:

localhost qnap # dd of=/dev/null if="./STUFF.ISO"
dd: reading './STUFF.ISO': Input/output error
53581824+0 records in
53581824+0 records out
27433893888 bytes (27 GB) copied, 1221.83 s, 22.5 MB/s

localhost qnap # dd of=/dev/null if="./STUFF.ISO" conv=sync,noerror
dd: reading './STUFF.ISO': Input/output error
53581824+0 records in
53581824+0 records out
27433893888 bytes (27 GB) copied, 1193.22 s, 23.0 MB/s
dd: reading './STUFF.ISO': Input/output error
53581824+1 records in
53581825+0 records out
27433894400 bytes (27 GB) copied, 1193.25 s, 23.0 MB/s
dd: reading './STUFF.ISO': Input/output error
53581824+2 records in
53581826+0 records out
27433894912 bytes (27 GB) copied, 1193.28 s, 23.0 MB/s
dd: reading './STUFF.ISO': Input/output error
53581824+3 records in
53581827+0 records out
27433895424 bytes (27 GB) copied, 1193.31 s, 23.0 MB/s
dd: reading './STUFF.ISO': Input/output error
53581824+4 records in
53581828+0 records out
27433895936 bytes (27 GB) copied, 1193.34 s, 23.0 MB/s
dd: reading './STUFF.ISO': Input/output error
53581824+5 records in
53581829+0 records out
27433896448 bytes (27 GB) copied, 1193.38 s, 23.0 MB/s
[...]
dd: reading './STUFF.ISO': Input/output error
59803648+2064 records in
59805712+0 records out
30620524544 bytes (31 GB) copied, 1508.02 s, 20.3 MB/s
dd: reading './STUFF.ISO': Input/output error
59803648+2065 records in
59805713+0 records out
30620525056 bytes (31 GB) copied, 1513.13 s, 20.2 MB/s
dd: reading './STUFF.ISO': Input/output error
59803648+2066 records in
59805714+0 records out
30620525568 bytes (31 GB) copied, 1518.22 s, 20.2 MB/s
dd: reading './STUFF.ISO': Input/output error
59803648+2067 records in
59805715+0 records out
30620526080 bytes (31 GB) copied, 1528.4 s, 20.0 MB/s

And finally got something reported:

localhost ~ # zpool status
  pool: DVD
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: none requested
config:

    NAME                                STATE     READ WRITE CKSUM
    DVD                                 ONLINE      20     0 2.00K
      ata-ST8000AS0002-1NA17Z_Z8401EL3  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401FAW  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401FP0  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401PH7  ONLINE      20     0 4.00K
      ata-ST8000AS0002-1NA17Z_Z84059L3  ONLINE       0     0     0

errors: 1 data errors, use '-v' for a list

And this stuff in dmesg

[632319.838475] ata5.00: exception Emask 0x0 SAct 0x1c0 SErr 0x0 action 0x0
[632319.838478] ata5.00: irq_stat 0x40000008
[632319.838480] ata5.00: failed command: READ FPDMA QUEUED
[632319.838485] ata5.00: cmd 60/40:30:28:aa:d9/05:00:c6:00:00/40 tag 6 ncq 688128 in
                         res 41/40:40:98:ae:d9/00:05:c6:00:00/00 Emask 0x409 (media error) <F>
[632319.838486] ata5.00: status: { DRDY ERR }
[632319.838487] ata5.00: error: { UNC }
[632319.841158] ata5.00: configured for UDMA/133
[632319.841174] sd 4:0:0:0: [sdk] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[632319.841177] sd 4:0:0:0: [sdk] tag#6 Sense Key : Medium Error [current] [descriptor] 
[632319.841179] sd 4:0:0:0: [sdk] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
[632319.841183] sd 4:0:0:0: [sdk] tag#6 CDB: Read(16) 88 00 00 00 00 00 c6 d9 aa 28 00 00 05 40 00 00
[632319.841184] blk_update_request: I/O error, dev sdk, sector 3336154776
[632319.841196] ata5: EH complete
[632322.411634] ata5.00: exception Emask 0x0 SAct 0x400002 SErr 0x0 action 0x0
[632322.411637] ata5.00: irq_stat 0x40000008
[632322.411640] ata5.00: failed command: READ FPDMA QUEUED
[632322.411644] ata5.00: cmd 60/c0:b0:68:af:d9/02:00:c6:00:00/40 tag 22 ncq 360448 in
                         res 41/40:c0:68:af:d9/00:02:c6:00:00/00 Emask 0x409 (media error) <F>
[632322.411646] ata5.00: status: { DRDY ERR }
[632322.411647] ata5.00: error: { UNC }
[632322.414328] ata5.00: configured for UDMA/133
[632322.414345] sd 4:0:0:0: [sdk] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[632322.414347] sd 4:0:0:0: [sdk] tag#22 Sense Key : Medium Error [current] [descriptor] 
[632322.414349] sd 4:0:0:0: [sdk] tag#22 Add. Sense: Unrecovered read error - auto reallocate failed
[632322.414352] sd 4:0:0:0: [sdk] tag#22 CDB: Read(16) 88 00 00 00 00 00 c6 d9 af 68 00 00 02 c0 00 00
[632322.414354] blk_update_request: I/O error, dev sdk, sector 3336154984
[632322.414359] ZFS: zio error=5 type=1 offset=1708109615104 size=1048576 flags=c0880
[632322.414376] ata5: EH complete

And also...

smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.6.7B] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Archive HDD
Device Model:     ST8000AS0002-1NA17Z
Serial Number:    Z8401PH7
LU WWN Device Id: 5 000c50 07a2c7b6c
Firmware Version: AR13
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
[blah blah]
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   093   092   006    -    29393144
  3 Spin_Up_Time            PO----   093   091   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    29
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   081   060   030    -    8846306065
  9 Power_On_Hours          -O--CK   091   091   000    -    8140
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    28
183 Runtime_Bad_Block       -O--CK   098   098   000    -    2
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   044   044   000    -    56
188 Command_Timeout         -O--CK   100   099   000    -    4295032833
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   066   046   045    -    34 (Min/Max 27/44)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    40
193 Load_Cycle_Count        -O--CK   100   100   000    -    645
194 Temperature_Celsius     -O---K   034   054   000    -    34 (0 16 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   111   099   000    -    29393144
197 Current_Pending_Sector  -O--C-   100   100   000    -    24
198 Offline_Uncorrectable   ----C-   100   100   000    -    24
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    1702 (223 166 0)
241 Total_LBAs_Written      ------   100   253   000    -    58058853937
242 Total_LBAs_Read         ------   100   253   000    -    571329563350
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

Or like, this, whatever...

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   093   092   006    Pre-fail  Always       -       29393144
  3 Spin_Up_Time            0x0003   093   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       29
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   081   060   030    Pre-fail  Always       -       8846306065
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       8140
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       28
183 Runtime_Bad_Block       0x0032   098   098   000    Old_age   Always       -       2
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   044   044   000    Old_age   Always       -       56
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       4295032833
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   046   045    Old_age   Always       -       34 (Min/Max 27/44)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       40
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       645
194 Temperature_Celsius     0x0022   034   054   000    Old_age   Always       -       34 (0 16 0 0 0)
195 Hardware_ECC_Recovered  0x001a   111   099   000    Old_age   Always       -       29393144
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       24
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       24
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       1702 (57 144 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       58058853937
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       571329563350

My other drives report 187 Reported_Uncorrect 0x0032 as 0, so I guess that means these are just ordinary bad sectors... But I really had to hammer on this spot before ZFS reported anything wrong.

Also, I exported this pool, and then imported it again, as it's forgotten all about these read errors. Current status:

localhost ~ # zpool status -v DVD
  pool: DVD
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: none requested
config:

    NAME                                STATE     READ WRITE CKSUM
    DVD                                 ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401EL3  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401FAW  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401FP0  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z8401PH7  ONLINE       0     0     0
      ata-ST8000AS0002-1NA17Z_Z84059L3  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /DVD/Z/DVD/qnap/STUFF.ISO
JuliaVixen commented 7 years ago

Just some more details... I upgraded ZFS to the latest GIT master, as of, yesterday; and plugged in one of these old drive pools, and tried reading from it again. (I was hoping maybe something different would happen, like it would report an actual read error or something, and self-heal it.)

filename:       /lib/modules/4.6.7B/extra/zfs/zfs.ko
version:        0.7.0-rc1
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
srcversion:     30F99291D4F46800B0E4D19
depends:        spl,znvpair,icp,zunicode,zcommon,zavl
vermagic:       4.6.7B SMP mod_unload modversions 
  pool: n
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: none requested
config:

    NAME                                   STATE     READ WRITE CKSUM
    n                                      ONLINE       0     0     0
      raidz1-0                             ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKGNH8BX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHJK9ZX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHNJWBX  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        n@Aug_23_2016:/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img

localhost ~ # dd bs=4096 of=/dev/null if=/n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img conv=noerror,sync
dd: reading '/n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img': Input/output error
101668544+0 records in
101668544+0 records out
416434356224 bytes (416 GB) copied, 3095.21 s, 135 MB/s
dd: reading '/n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img': Input/output error
101668544+1 records in
101668545+0 records out
416434360320 bytes (416 GB) copied, 3095.22 s, 135 MB/s
dd: reading '/n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img': Input/output error
101668544+2 records in
101668546+0 records out
416434364416 bytes (416 GB) copied, 3095.22 s, 135 MB/s
dd: reading '/n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img': Input/output error
101668544+3 records in
101668547+0 records out
416434368512 bytes (416 GB) copied, 3095.22 s, 135 MB/s
dd: reading '/n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img': Input/output error
101668544+4 records in
101668548+0 records out
[etc.]
dd: reading '/n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img': Input/output error
101668544+30 records in
101668574+0 records out
416434479104 bytes (416 GB) copied, 3095.31 s, 135 MB/s
dd: reading '/n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img': Input/output error
101668544+31 records in
101668575+0 records out
416434483200 bytes (416 GB) copied, 3095.31 s, 135 MB/s
244190614+32 records in
244190646+0 records out
1000204886016 bytes (1.0 TB) copied, 7424.2 s, 135 MB/s

localhost ~ # zpool status -v n
  pool: n
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: none requested
config:

    NAME                                   STATE     READ WRITE CKSUM
    n                                      ONLINE       0     0    33
      raidz1-0                             ONLINE       0     0    66
        ata-WDC_WD80EFZX-68UW8N0_VKGNH8BX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHJK9ZX  ONLINE       0     0     0
        ata-WDC_WD80EFZX-68UW8N0_VKHNJWBX  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img
        n@Aug_23_2016:/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img

Nothing in dmesg.

Everything in the output of smartctl -x looks good...

  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  4 Start_Stop_Count        -O--C-   100   100   000    -    9
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
 7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    18
[etc...]
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
[and so on, for all three drives]

So, this one isn't a sector read error....

kernelOfTruth commented 7 years ago

So you ran scrub,

I just thought of spacemap issues, hole_birth issues, etc. [there was incorrect spacemap usage in the past and via zdb I could detect it]

did you run

zdb

to check for metadata consistency and other options

e.g.

zdb -M -m -c

not sure if that's the optimal commands to pass to zdb

but I'm curious what that would reveal

Did you attempt to troubleshoot this further via IRC input meanwhile ?

JuliaVixen commented 7 years ago

OMG! Check this out! I set zfs_send_corrupt_data, and did a send|recv of the corrupt filesystem to a new pool... and now it appears that the checksum error is gone... Either that or the file is just silently corrupt now, I haven't actually checked yet. (More experimentation to follow.) So is this expected behavior?

I did this:

echo 1 > /sys/module/zfs/parameters/zfs_send_corrupt_data

zpool create \
-O atime=off \
-O exec=off \
-O devices=off \
-O setuid=off \
-O checksum=sha256 \
-O compression=gzip-9 \
-o ashift=12 \
-o version=28 \
-o autoexpand=off \
-o autoreplace=off \
-o comment="Redo Qnap" \
-d \
-f \
    Q2 raidz1 \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840J9F5 \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840JB7D \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840JQ2W \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840JRPH \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840JRZT \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840JSBJ \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840JSLB \
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840JSRK

zfs send -v -R n@Aug_23_2016 | zfs recv -evF Q2

I have a scrub of the pool currently in progress, but in the meantime, I checked if I could read that corrupt file again without error... and there was no I/O error this time. I haven't actually checked whether or not the actual data is actually correct, or slightly corrupt.

localhost ~ # dd bs=4096 of=/dev/null if=/Q2/n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img conv=noerror,sync
244190646+0 records in
244190646+0 records out
1000204886016 bytes (1.0 TB) copied, 10787.6 s, 92.7 MB/s
JuliaVixen commented 7 years ago

I haven't run zdb -M -m -c or jumped on IRC yet. (I've been pretty busy this week...) And I actually only just now saw your reply as I refreshed the page in my browser....

behlendorf commented 7 years ago

@JuliaVixen setting zfs_send_corrupt_data will instruct zfs to send the block anyway even if it has a bad checksum. When sent the damaged blocks will be filled with the pattern 0x2f5baddb10c, "zfs badd bloc".

JuliaVixen commented 7 years ago

Oh... yeah...

48467df690e6b8f0202a0b586d874257  ./disks/copied_from_n/ST31000340AS_9QJ0J2GY_Reiserfs.img
2179976097960b01e444ef853b9235ef  ./n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img

localhost ~ # dd bs=4096 skip=101668544 if=./n/backed_up/ST31000340AS_9QJ0J2GY_Reiserfs.img | hexdump -vC

00000000  0c b1 dd ba f5 02 00 00  0c b1 dd ba f5 02 00 00  |................|
00000010  0c b1 dd ba f5 02 00 00  0c b1 dd ba f5 02 00 00  |................|
00000020  0c b1 dd ba f5 02 00 00  0c b1 dd ba f5 02 00 00  |................|
00000030  0c b1 dd ba f5 02 00 00  0c b1 dd ba f5 02 00 00  |................|
00000040  0c b1 dd ba f5 02 00 00  0c b1 dd ba f5 02 00 00  |................|
00000050  0c b1 dd ba f5 02 00 00  0c b1 dd ba f5 02 00 00  |................|
00000060  0c b1 dd ba f5 02 00 00  0c b1 dd ba f5 02 00 00  |................|

In other news, I've been keeping all of my new pools on version 28, and I haven't had a single problem with anything at all...

JuliaVixen commented 7 years ago

Does anyone want to investigate the corruption of these specific zpools? I want to wipe these drives and use them for something else.

behlendorf commented 7 years ago

@JuliaVixen go ahead and wipe the drives. We'd ideally like to investigate but everyone's quite busy.

gordan-bobic commented 7 years ago

@JuliaVixen , have you been able to reproduce this corruption arising: 1) On the same hardware but with a different OS (e.g. FreeBSD) 2) On different hardware but with the same OS install (move the rootfs disk across) 3) Same disks in a different machine 4) Different disks in the same machine

From reading the above, you mentioned getting completely new hardware, but it isn't obvious at a glance whether you were just seeing the errors on the disks that arose before you got your new hardware.

It is also not clear whether you have excluded the possibility of the disks being at fault here (e.g. a phantom read of a wrong sector).

JuliaVixen commented 7 years ago

I've gone though at least 80 hard drives now, several different models from different manufacturers, and two different machines. Mostly similar versions of Linux and ZFS though. Since September, I've been creating all of my new pools at Version 28, like this:

zpool create \
 -O atime=off \
 -O exec=off \
 -O devices=off \
 -O setuid=off \
 -O checksum=sha256 \
 -O compression=gzip-9 \
 -o ashift=12 \
 -o version=28 \
 -o autoexpand=off \
 -o autoreplace=off \
 -o comment="Backup Q2" \
 -d \
 -f \
     Q3 raidz2 \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840C6WK \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MENV \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MESR \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MEWT \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840ML54 \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840ML5N \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MPKQ \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MPRB \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MPY7 \
 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MQ10

And I'm using this as an example, because as of a few minutes ago, it's the first pool which I've created and used this way, to exhibit this nonspecific checksum error. (Every other Ver 28 pool has been ok so far as far as I know.)

  pool: Q3
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: none requested
config:

    NAME                                  STATE     READ WRITE CKSUM
    Q3                                    ONLINE       0     0     1
      raidz2-0                            ONLINE       0     0     2
        ata-ST8000AS0002-1NA17Z_Z840C6WK  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MENV  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MESR  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MEWT  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840ML54  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840ML5N  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MPKQ  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MPRB  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MPY7  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MQ10  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        Q3/Q2/r@Aug_18_2016:/iOmega_redo.img

(Imported with -o readonly=on)

Really, seriously, it's not the hard drives. SMART reports nothing, Linux reports nothing, I've wiped and reused the drives from previous pools with this error, and the new pools have no errors (at least none detected yet). I was thinking that this was something related to some kind of new on-disk format feature.... but this pool has none...

NAME  PROPERTY                    VALUE                       SOURCE
Q3    size                        72.5T                       -
Q3    capacity                    95%                         -
Q3    altroot                     -                           default
Q3    health                      ONLINE                      -
Q3    guid                        4443373289161212188         -
Q3    version                     28                          local
Q3    bootfs                      -                           default
Q3    delegation                  on                          default
Q3    autoreplace                 off                         default
Q3    cachefile                   -                           default
Q3    failmode                    wait                        default
Q3    listsnapshots               off                         default
Q3    autoexpand                  off                         default
Q3    dedupditto                  0                           default
Q3    dedupratio                  1.04x                       -
Q3    free                        3.36T                       -
Q3    allocated                   69.1T                       -
Q3    readonly                    on                          -
Q3    ashift                      12                          local
Q3    comment                     Backup Q2                   local
Q3    expandsize                  -                           -
Q3    freeing                     0                           -
Q3    fragmentation               0%                          -
Q3    leaked                      0                           -
Q3    feature@async_destroy       disabled                    local
Q3    feature@empty_bpobj         disabled                    local
Q3    feature@lz4_compress        disabled                    local
Q3    feature@spacemap_histogram  disabled                    local
Q3    feature@enabled_txg         disabled                    local
Q3    feature@hole_birth          disabled                    local
Q3    feature@extensible_dataset  disabled                    local
Q3    feature@embedded_data       disabled                    local
Q3    feature@bookmarks           disabled                    local
Q3    feature@filesystem_limits   disabled                    local
Q3    feature@large_blocks        disabled                    local
Q3    feature@large_dnode         disabled                    local
Q3    feature@sha512              disabled                    local
Q3    feature@skein               disabled                    local
Q3    feature@edonr               disabled                    local
Q3    feature@userobj_accounting  disabled                    local

I'm pretty sure I was using the Oct 19, 2016 "0.7.0-rc1" snapshot https://github.com/zfsonlinux/zfs/commit/9d70aec6fde90112b5b5610ab5c17b6883b97063 (with SPL https://github.com/zfsonlinux/spl/commit/0d267566650d89bde8bd5ec4665749810d5bafc7 ) to create this and write all the data. Most of it was in one big zfs send|recv, but then I think I created some new filesystems and wrote files into them.

Linux Kernel 4.6.7

Nov 19 01:17:22 localhost kernel: ZFS: Loaded module v0.7.0-rc1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
History for 'Q3':
2016-11-21.06:17:31 zpool create -O atime=off -O exec=off -O devices=off -O setuid=off -O checksum=sha256 -O compression=gzip-9 -o ashift=12 -o version=28 -o autoexpand=off -o autoreplace=off -o comment=Backup Q2 -d -f Q3 raidz2 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840C6WK /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MENV /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MESR /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MEWT /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840ML54 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840ML5N /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MPKQ /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MPRB /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MPY7 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MQ10
2016-12-02.22:40:58 zfs recv -evF Q3
2016-12-03.06:45:49 zfs create Q3/reiserfses
2016-12-04.10:31:48 zfs recv -evF Q3
2016-12-05.02:47:54 zfs create Q3/root_stuff
2016-12-05.04:25:50 zfs create Q3/2016_Dec_05_CDs
2016-12-11.07:19:29 zfs snap Q3/2016_Dec_05_CDs@2016_Dec_11
2016-12-11.07:19:58 zfs snap Q3/root_stuff@2016_Dec_11
2016-12-11.14:38:29 zfs send -v -R Q3/photos@Mar1414
2016-12-11.14:49:40 zfs send -v -R Q3/2016_Dec_05_CDs@2016_Dec_11
2016-12-11.14:50:07 zfs send -v -R Q3/root_stuff@2016_Dec_11
2016-12-11.23:35:07 zpool export Q3
2017-01-27.22:30:12 zpool import Q3
2017-01-27.22:32:47 zfs snap -r Q3@Q4
2017-01-27.22:33:06 zpool export Q3
chrisrd commented 7 years ago

@JuliaVixen Might be an idea to check your Seek_Error_Rate on all disks. E.g. your entry on from 8 Oct 2016 showed:

Device Model:     ST8000AS0002-1NA17Z
Serial Number:    Z8401PH7
...
  7 Seek_Error_Rate         POSR--   081   060   030    -    8846306065

That could indicate vibration problems, leading to intermittent and unrepeatable bad read/writes, particularly under heavy load.

JuliaVixen commented 7 years ago

Seagate actually packs two variables into that SMART field. The top 16 bits are the number of seek errors, and the bottom 32 bits are the total number of seeks (so far?). The values are all between 52607246 and 60153197 decimal, but follow a regular pattern in hex...

# for i in /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840???? ; do smartctl -v 7,hex48 -x $i ; done

== START OF INFORMATION SECTION ===
Model Family:     Seagate Archive HDD
Device Model:     ST8000AS0002-1NA17Z
Serial Number:    Z840C6WK
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   078   060   030    -    0x0000 0396010a
...
Serial Number:    Z840MENV
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   078   060   030    -    0x0000 0390b1a7
...
Serial Number:    Z840MESR
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   077   060   030    -    0x0000 0324c69c
...
Serial Number:    Z840MEWT
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   078   060   030    -    0x0000 038b7a4a
...
Serial Number:    Z840ML54
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   078   060   030    -    0x0000 039108b2
...
Serial Number:    Z840ML5N
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   078   060   030    -    0x0000 038b6245
...
Serial Number:    Z840MPKQ
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   078   060   030    -    0x0000 03925cca
...
Serial Number:    Z840MPRB
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   078   060   030    -    0x0000 03916033
...
Serial Number:    Z840MPY7
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   078   060   030    -    0x0000 03901783
...
Serial Number:    Z840MQ10
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  7 Seek_Error_Rate         POSR--   077   060   030    -    0x0000 0322dddd

So, the error count is actually zero for all these drives. I'm not sure why smartctl doesn't split this up if it's one of these Seagate disk models.

chrisrd commented 7 years ago

Yes, I just started looking into how Seagate reports that SMART parameter myself. In case anyone else is following along, the reference I found is: http://www.users.on.net/~fzabkar/HDD/Seagate_SER_RRER_HEC.html

I agree your disks aren't displaying any seek errors.

Sorry, I thought I might have been onto something there!

JuliaVixen commented 7 years ago

So... I bought another nine of the Seagate 8T archive drives, to do yet another backup... again. And created a new raidz1 pool on them, and was doing a zfs send|recv thing and stuff like usual. And it was poking along at a slow 5MB/s to 80MB/s, which would take a few days. So, I decided to try an experiment, of just DD'ing the raw disk images from the old drives to the new ones. They're all the exact same model number and stuff. I was also curious to see if there was an idea buffer size for writing to the SMR disks in linear sector order.

dd bs=1M conv=sync,noerror if=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840C6WK of=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840KN97 &

dd: writing '/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840KN97': No space left on device
7630885+1 records in
7630885+0 records out
8001563222016 bytes (8.0 TB) copied, 55318.5 s, 145 MB/s

dd bs=2M conv=sync,noerror if=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MENV of=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840M51B &

dd: writing '/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840M51B': No space left on device
3815442+1 records in
3815442+0 records out
8001563222016 bytes (8.0 TB) copied, 61355.3 s, 130 MB/s

dd bs=3M conv=sync,noerror if=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MEWT of=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840M503 &

dd: writing '/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840M503': No space left on device
2543628+1 records in
2543628+0 records out
8001563222016 bytes (8.0 TB) copied, 63707.7 s, 126 MB/s

dd bs=4M conv=sync,noerror if=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840ML54 of=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840KQA6 &

dd: writing '/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840KQA6': No space left on device
1907721+1 records in
1907721+0 records out
8001563222016 bytes (8.0 TB) copied, 65214.3 s, 123 MB/s

# The following four had the destination drive on a different SATA controller, from the 12-port SAS controller that everything else is plugged into

dd bs=5M conv=sync,noerror if=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840ML5N of=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840PRW7 &

dd: writing '/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840PRW7': No space left on device
1526177+1 records in
1526177+0 records out
8001563222016 bytes (8.0 TB) copied, 64639.7 s, 124 MB/s

dd bs=6M conv=sync,noerror if=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MPKQ of=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MVHM &

dd: writing '/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MVHM': No space left on device
1271814+1 records in
1271814+0 records out
8001563222016 bytes (8.0 TB) copied, 64813 s, 123 MB/s

dd bs=7M conv=sync,noerror if=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MPRB of=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840PC4J &

dd: writing '/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840PC4J': No space left on device
1090126+1 records in
1090126+0 records out
8001563222016 bytes (8.0 TB) copied, 64110.2 s, 125 MB/s

dd bs=8M conv=sync,noerror if=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MPY7 of=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MQ6K &

dd: writing '/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MQ6K': No space left on device
953860+1 records in
953860+0 records out
8001563222016 bytes (8.0 TB) copied, 64061 s, 125 MB/s

And then this one, with both drives plugged into the SAS controller, but I also had ten more dd's going with a different set of five drives.

dd bs=1M conv=sync,noerror if=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MQ10 of=/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MWRK &

dd: writing '/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MWRK': No space left on device
7630885+1 records in
7630885+0 records out
8001563222016 bytes (8.0 TB) copied, 56557.9 s, 141 MB/s

So, I imported the pool to update the disk names from /dev/sdc to /dev/disk/by-id/ata-ST800etcetc, then exported it, and imported it again with -o readonly=on, and checked to see if there was still a checksum error.

localhost ~ # zpool status
  pool: Q3
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

    NAME                                  STATE     READ WRITE CKSUM
    Q3                                    DEGRADED     0     0     0
      raidz2-0                            DEGRADED     0     0     0
        ata-ST8000AS0002-1NA17Z_Z840KN97  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840M51B  ONLINE       0     0     0
        7186583607081167057               UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MESR-part1
        ata-ST8000AS0002-1NA17Z_Z840M503  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840KQA6  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840PRW7  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MVHM  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840PC4J  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MQ6K  ONLINE       0     0     0
        14414960048424282127              UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MQ10-part1

errors: No known data errors

localhost ~ # dd if=/Q3/Q2/r/iOmega_redo.img of=/dev/null 

dd: reading '/Q3/Q2/r/iOmega_redo.img': Input/output error
1034947840+0 records in
1034947840+0 records out
529893294080 bytes (530 GB) copied, 24634.1 s, 21.5 MB/s

localhost ~ # zpool status -v
  pool: Q3
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: none requested
config:

    NAME                                  STATE     READ WRITE CKSUM
    Q3                                    DEGRADED     0     0     1
      raidz2-0                            DEGRADED     0     0     2
        ata-ST8000AS0002-1NA17Z_Z840KN97  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840M51B  ONLINE       0     0     0
        7186583607081167057               UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MESR-part1
        ata-ST8000AS0002-1NA17Z_Z840M503  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840KQA6  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840PRW7  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MVHM  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840PC4J  ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z840MQ6K  ONLINE       0     0     0
        14414960048424282127              UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840MQ10-part1

errors: Permanent errors have been detected in the following files:

        /Q3/Q2/r/iOmega_redo.img

Answer: yes.

(This was before I put the ninth drive back in, so ignore that it's missing in this example.)

Anyway... I copied every sector off from the original set of hard drives to the new ones, with no hardware errors being reported. The new set of disks has the exact same checksum error in and exact same spot.

So...

What is the magic combination of options I need to pass to zdb to get a dump of the blocks which are failing the checksum? I'm really curious to find out what data is actually within them. If it's a misplaced ZFS object, or just a bunch, of nulls or whatever.

skrupler commented 7 years ago

Has this issue been confirmed? I possibly suffer from the same issue and I dont think its a hardware issue either. How can we proceed here? I'd be happy to test and supply logs.

JuliaVixen commented 7 years ago

Well, it happened again...

  pool: B_redux
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: none requested
config:

        NAME                                             STATE     READ WRITE CKSUM
        B_redux                                          ONLINE       0     0     2
          raidz1-0                                       ONLINE       0     0     4
            ata-WDC_WD80EFZX-68UW8N0_VKGU6V2X            ONLINE       0     0     0
            ata-WDC_WD80EFZX-68UW8N0_VKH408MX            ONLINE       0     0     0
            ata-WDC_WD80EFZX-68UW8N0_VKH52T6X            ONLINE       0     0     0
            ata-WDC_WD80EFZX-68UW8N0_VKHLNHZX            ONLINE       0     0     0
        cache
          ata-SAMSUNG_MZHPU512HCGL-00004_S1NDNYAFC00958  UNAVAIL      0     0     0

errors: Permanent errors have been detected in the following files:

        B_redux/B@2017_Mar_10:/ata-ST32000542AS_5XW17Z49.img

History for 'B_redux':
2017-02-07.09:07:26 zpool create B_redux raidz /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKGU6V2X /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKH408MX /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKH52T6X /dev/disk/by-id/ata-WDC_WD80EFZX-68UW8N0_VKHLNHZX
2017-02-07.09:09:07 zfs create B_redux/B
2017-02-07.09:09:24 zfs set compression=lz4 B_redux/B
2017-02-07.09:09:34 zfs set recordsize=1M B_redux/B
2017-02-07.09:11:30 zfs set checksum=sha256 B_redux/B
2017-02-07.10:45:11 zpool add B_redux cache /dev/disk/by-id/ata-SAMSUNG_MZHPU512HCGL-00004_S1NDNYAFC00958
2017-02-15.11:15:42 zfs create B_redux/2017_Memcards
2017-02-15.11:26:32 zfs create B_redux/2017_CDs
2017-02-23.13:01:56 zfs create B_redux/Time_Machine_2009
2017-02-23.13:02:28 zfs set compression=on B_redux/Time_Machine_2009
2017-02-23.13:02:39 zfs set checksum=sha256 B_redux/Time_Machine_2009
2017-02-25.06:54:51 zfs create B_redux/f
2017-02-25.06:55:10 zfs set compression=lz4 B_redux/f
2017-02-25.06:55:27 zfs set checksum=sha256 B_redux/f
2017-02-25.09:18:37 zfs recv -Fv B_redux/f
2017-02-26.14:16:33 zfs recv -Fv B_redux/f/copied_flacs
2017-02-26.20:12:06 zfs recv -Fv B_redux/f/e
2017-02-28.03:38:05 zfs recv -Fv B_redux/f/dedup1
2017-02-28.10:52:39 zfs destroy -r B_redux/f/dedup1
2017-02-28.11:07:16 zfs create -V 4G B_redux/e_temp_vol
2017-02-28.11:08:43 zfs set compression=lz4 B_redux/e_temp_vol
2017-02-28.11:11:12 zfs create -V 100G B_redux/e_temp_vol2
2017-02-28.11:11:35 zfs set compression=lz4 B_redux/e_temp_vol
2017-02-28.11:11:42 zfs set compression=lz4 B_redux/e_temp_vol2
2017-02-28.11:16:47 zfs set atime=off B_redux
2017-02-28.11:16:54 zfs set devices=off B_redux
2017-02-28.11:17:02 zfs set exec=off B_redux
2017-02-28.11:17:17 zfs set setuid=off B_redux
2017-02-28.11:17:33 zfs set recordsize=1M B_redux
2017-02-28.11:17:44 zfs set compression=lz4 B_redux
2017-02-28.11:17:54 zfs set checksum=sha256 B_redux
2017-03-06.05:49:19 zfs recv -v -s B_redux/e/d
2017-03-06.10:33:51 zfs recv -v -s B_redux/e/dedup1
2017-03-07.03:07:30 zfs recv -v -s B_redux/e/dedup1
2017-03-07.08:44:20 zfs recv -v -s B_redux/e/dedup1
2017-03-07.08:44:44 zfs destroy B_redux/e/dedup1@2017_Feb_27
2017-03-07.08:57:44 zfs create -V 100G B_redux/f_temp_vol
2017-03-07.08:57:56 zfs set compression=lz4 B_redux/f_temp_vol
2017-03-07.08:59:15 zfs create -V 100G B_redux/f_temp_vol2
2017-03-07.08:59:47 zfs set compression=lz4 B_redux/f_temp_vol2
2017-03-07.09:05:48 zfs rename B_redux/f B_redux/previous_f
2017-03-07.16:11:35 zfs destroy B_redux/f/dedup1
2017-03-07.17:52:39 zfs recv -vF B_redux/e/dedup1
2017-03-08.00:37:04 zfs recv -v -s B_redux/f/copied_flacs
2017-03-08.10:13:27 zfs destroy -r B_redux/previous_f/e
2017-03-08.16:42:15 zfs recv -v -s B_redux/f/e
2017-03-08.18:35:26 zpool export B_redux
2017-03-08.18:36:06 zpool export B_redux
2017-03-08.18:36:42 zpool export B_redux
2017-03-08.18:38:10 zfs destroy B_redux/e_temp_vol
2017-03-08.18:38:15 zfs destroy B_redux/e_temp_vol2
2017-03-08.18:38:17 zfs destroy B_redux/f_temp_vol2
2017-03-08.18:38:22 zfs destroy B_redux/f_temp_vol
2017-03-08.18:38:38 zpool export B_redux
2017-03-11.01:05:50 zfs snap -r B_redux@2017_Mar_10
2017-03-15.11:34:53 zfs snap -r B_redux@2017_Mar_15
2017-03-15.11:35:35 zpool export B_redux

NAME     PROPERTY                    VALUE                       SOURCE
B_redux  size                        29T                         -
B_redux  capacity                    88%                         -
B_redux  altroot                     -                           default
B_redux  health                      ONLINE                      -
B_redux  guid                        14583332645650771187        -
B_redux  version                     -                           default
B_redux  bootfs                      -                           default
B_redux  delegation                  on                          default
B_redux  autoreplace                 off                         default
B_redux  cachefile                   -                           default
B_redux  failmode                    wait                        default
B_redux  listsnapshots               off                         default
B_redux  autoexpand                  off                         default
B_redux  dedupditto                  0                           default
B_redux  dedupratio                  1.71x                       -
B_redux  free                        3.37T                       -
B_redux  allocated                   25.6T                       -
B_redux  readonly                    on                          -
B_redux  ashift                      0                           default
B_redux  comment                     -                           default
B_redux  expandsize                  -                           -
B_redux  freeing                     0                           -
B_redux  fragmentation               0%                          -
B_redux  leaked                      0                           -
B_redux  feature@async_destroy       enabled                     local
B_redux  feature@empty_bpobj         active                      local
B_redux  feature@lz4_compress        active                      local
B_redux  feature@spacemap_histogram  active                      local
B_redux  feature@enabled_txg         active                      local
B_redux  feature@hole_birth          active                      local
B_redux  feature@extensible_dataset  active                      local
B_redux  feature@embedded_data       active                      local
B_redux  feature@bookmarks           enabled                     local
B_redux  feature@filesystem_limits   enabled                     local
B_redux  feature@large_blocks        active                      local
B_redux  feature@large_dnode         enabled                     local
B_redux  feature@sha512              enabled                     local
B_redux  feature@skein               enabled                     local
B_redux  feature@edonr               enabled                     local
B_redux  feature@userobj_accounting  active                      local

NAME     PROPERTY              VALUE                  SOURCE
B_redux  type                  filesystem             -
B_redux  creation              Tue Feb  7  9:07 2017  -
B_redux  used                  20.1T                  -
B_redux  available             1.75T                  -
B_redux  referenced            186K                   -
B_redux  compressratio         1.07x                  -
B_redux  mounted               yes                    -
B_redux  quota                 none                   default
B_redux  reservation           none                   default
B_redux  recordsize            1M                     local
B_redux  mountpoint            /B_redux               default
B_redux  sharenfs              off                    default
B_redux  checksum              sha256                 local
B_redux  compression           lz4                    local
B_redux  atime                 off                    local
B_redux  devices               off                    local
B_redux  exec                  off                    local
B_redux  setuid                off                    local
B_redux  readonly              on                     temporary
B_redux  zoned                 off                    default
B_redux  snapdir               hidden                 default
B_redux  aclinherit            restricted             default
B_redux  canmount              on                     default
B_redux  xattr                 on                     default
B_redux  copies                1                      default
B_redux  version               5                      -
B_redux  utf8only              off                    -
B_redux  normalization         none                   -
B_redux  casesensitivity       sensitive              -
B_redux  vscan                 off                    default
B_redux  nbmand                off                    default
B_redux  sharesmb              off                    default
B_redux  refquota              none                   default
B_redux  refreservation        none                   default
B_redux  primarycache          all                    default
B_redux  secondarycache        all                    default
B_redux  usedbysnapshots       0                      -
B_redux  usedbydataset         186K                   -
B_redux  usedbychildren        20.1T                  -
B_redux  usedbyrefreservation  0                      -
B_redux  logbias               latency                default
B_redux  dedup                 off                    default
B_redux  mlslabel              none                   default
B_redux  sync                  standard               default
B_redux  dnodesize             legacy                 default
B_redux  refcompressratio      2.29x                  -
B_redux  written               0                      -
B_redux  logicalused           21.7T                  -
B_redux  logicalreferenced     81.5K                  -
B_redux  filesystem_limit      none                   default
B_redux  snapshot_limit        none                   default
B_redux  filesystem_count      none                   default
B_redux  snapshot_count        none                   default
B_redux  snapdev               hidden                 default
B_redux  acltype               off                    default
B_redux  context               none                   default
B_redux  fscontext             none                   default
B_redux  defcontext            none                   default
B_redux  rootcontext           none                   default
B_redux  relatime              off                    default
B_redux  redundant_metadata    all                    default
B_redux  overlay               off                    default

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EFZX-68UW8N0
Serial Number:    VKGU6V2X
[...]
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   132   132   054    -    112
  3 Spin_Up_Time            POS---   150   150   024    -    452 (Average 425)
  4 Start_Stop_Count        -O--C-   100   100   000    -    35
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    18
  9 Power_On_Hours          -O--C-   100   100   000    -    2845
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    34
 22 Unknown_Attribute       PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   100   100   000    -    675
193 Load_Cycle_Count        -O--C-   100   100   000    -    675
194 Temperature_Celsius     -O----   162   162   000    -    37 (Min/Max 18/54)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    14

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EFZX-68UW8N0
Serial Number:    VKH408MX
[...]
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   132   132   054    -    112
  3 Spin_Up_Time            POS---   151   151   024    -    450 (Average 421)
  4 Start_Stop_Count        -O--C-   100   100   000    -    30
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    18
  9 Power_On_Hours          -O--C-   100   100   000    -    2775
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    28
 22 Unknown_Attribute       PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   100   100   000    -    554
193 Load_Cycle_Count        -O--C-   100   100   000    -    554
194 Temperature_Celsius     -O----   176   176   000    -    34 (Min/Max 18/57)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EFZX-68UW8N0
Serial Number:    VKH52T6X
[...]
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   131   131   054    -    116
  3 Spin_Up_Time            POS---   150   150   024    -    454 (Average 423)
  4 Start_Stop_Count        -O--C-   100   100   000    -    33
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    18
  9 Power_On_Hours          -O--C-   100   100   000    -    2868
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    33
 22 Unknown_Attribute       PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   100   100   000    -    731
193 Load_Cycle_Count        -O--C-   100   100   000    -    731
194 Temperature_Celsius     -O----   162   162   000    -    37 (Min/Max 18/58)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0

== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EFZX-68UW8N0
Serial Number:    VKHLNHZX
[...]
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   129   129   054    -    124
  3 Spin_Up_Time            POS---   146   146   024    -    466 (Average 432)
  4 Start_Stop_Count        -O--C-   100   100   000    -    33
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    18
  9 Power_On_Hours          -O--C-   100   100   000    -    2871
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    31
 22 Unknown_Attribute       PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   100   100   000    -    741
193 Load_Cycle_Count        -O--C-   100   100   000    -    741
194 Temperature_Celsius     -O----   166   166   000    -    36 (Min/Max 18/56)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0

Nothing in dmesg, and no memory errors reported by edac

# edac-util -v    
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: CPU#0Channel#0_DIMM#0: 0 Corrected Errors
mc0: csrow0: CPU#0Channel#1_DIMM#0: 0 Corrected Errors
mc0: csrow0: CPU#0Channel#2_DIMM#0: 0 Corrected Errors
mc0: csrow1: 0 Uncorrected Errors
mc0: csrow1: CPU#0Channel#0_DIMM#1: 0 Corrected Errors
mc0: csrow1: CPU#0Channel#1_DIMM#1: 0 Corrected Errors
mc0: csrow1: CPU#0Channel#2_DIMM#1: 0 Corrected Errors
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: CPU#0Channel#1_DIMM#2: 0 Corrected Errors
mc0: csrow2: CPU#0Channel#2_DIMM#2: 0 Corrected Errors
mc1: 0 Uncorrected Errors with no DIMM info
mc1: 0 Corrected Errors with no DIMM info
mc1: csrow0: 0 Uncorrected Errors
mc1: csrow0: CPU#1Channel#0_DIMM#0: 0 Corrected Errors
mc1: csrow0: CPU#1Channel#1_DIMM#0: 0 Corrected Errors
mc1: csrow0: CPU#1Channel#2_DIMM#0: 0 Corrected Errors
mc1: csrow1: 0 Uncorrected Errors
mc1: csrow1: CPU#1Channel#0_DIMM#1: 0 Corrected Errors
mc1: csrow1: CPU#1Channel#1_DIMM#1: 0 Corrected Errors
mc1: csrow1: CPU#1Channel#2_DIMM#1: 0 Corrected Errors
mc1: csrow2: 0 Uncorrected Errors
mc1: csrow2: CPU#1Channel#1_DIMM#2: 0 Corrected Errors
mc1: csrow2: CPU#1Channel#2_DIMM#2: 0 Corrected Errors
edac-util: No errors to report.

And on my other computer which this pool was mounted on...

# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
mc0: csrow0: mc#0csrow#0channel#1: 0 Corrected Errors
mc0: csrow1: 0 Uncorrected Errors
mc0: csrow1: mc#0csrow#1channel#0: 0 Corrected Errors
mc0: csrow1: mc#0csrow#1channel#1: 0 Corrected Errors
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: mc#0csrow#2channel#0: 0 Corrected Errors
mc0: csrow2: mc#0csrow#2channel#1: 0 Corrected Errors
mc0: csrow3: 0 Uncorrected Errors
mc0: csrow3: mc#0csrow#3channel#0: 0 Corrected Errors
mc0: csrow3: mc#0csrow#3channel#1: 0 Corrected Errors
edac-util: No errors to report.
jwittlincohen commented 7 years ago

I experienced similar uncorrectable errors which I was able to trace down to a bad SAS controller. Apparently, under heavy load it would corrupt new writes and also sometimes drop disks. Replacing both controllers of the impacted model with LSI 9211-8i controllers (in IT mode) resolved the issue, and I've been problem free since. See https://github.com/zfsonlinux/zfs/issues/2867#issuecomment-63525207

Bad drives aren't the only thing that can cause this issue. It could be defective memory (if not using ECC), power supply issues, controller problems, bad cables etc.

tonyhutter commented 6 years ago

@JuliaVixen can you attach a copy of the zpool events -v output?

stephan2012 commented 5 years ago

On a brand new server, I was facing the same issue. Spurious checksum error, sometimes created at the end of resilvering a mirrored disk. I have no more issues since I've disabled the (Intel SATA SSD SSDSC2KB96) disk's write cache with

hdparm -W 0 /dev/sda
hdparm -W 0 /dev/sdb

However, this still looks somewhat strange to me. First, checksum errors were shown always on sdb. Second, this type of SSD has a protection against data-loss on power failure.

rincebrain commented 5 years ago

@stephan2012 you might want to report that to e.g. the Linux kernel folks, I believe they still maintain a blacklist for devices with known brokenness.

shodanshok commented 5 years ago

Hi all, as this seems a quite concerning problem (untractable/untrackable data corruption), has anyone investigated it? Any ideas on what was the root cause?

rincebrain commented 5 years ago

@shodanshok Well, there are two different people's issues in here which aren't immediately obviously related, everyone else isn't complaining that their data spontaneously caught fire, the original poster never replied to the request for specific information, @jwittlincohen said they had a similar issue which turned out to be a misbehaving piece of hardware, and the bug is originally from 2016.

Since one person seems to have a single misbehaving drive, and the other didn't provide the requested debugging information, what exactly is it that you would like done here?

shodanshok commented 5 years ago

@rincebrain sure, my question was more along the line "did someone discovered some problems and fixed it, or no issues were ever identified?"

Well, I think you already replied me, thanks.

capnbb commented 5 years ago

Dear All, We have a similar problem, happening on three of our servers. It's not a normal drive failure; "zpool status" never shows a CHKSUM error at the drive level - it's always at the VDEV.

For example: [root@dstore1 ~]# zpool status -v pool: dstore1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A scan: scrub in progress since Wed Feb 27 17:11:23 2019 247T scanned out of 497T at 3.06G/s, 23h17m to go 0B repaired, 49.64% done config: NAME STATE READ WRITE CKSUM dstore1 ONLINE 0 0 46 raidz1-0 ONLINE 0 0 8 wwn-0x5000cca254df6d38 ONLINE 0 0 0 wwn-0x5000cca254e12b37 ONLINE 0 0 0 wwn-0x5000cca254e10426 ONLINE 0 0 0 wwn-0x5000cca254e4b029 ONLINE 0 0 0 wwn-0x5000cca254df5910 ONLINE 0 0 0 wwn-0x5000cca254e48bd1 ONLINE 0 0 0 wwn-0x5000cca254df6f34 ONLINE 0 0 0 wwn-0x5000cca23bd9cd86 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 8 wwn-0x5000cca23bda212a ONLINE 0 0 0 wwn-0x5000cca254e4e2fb ONLINE 0 0 0 wwn-0x5000cca254dfd8b8 ONLINE 0 0 0 wwn-0x5000cca254e1e184 ONLINE 0 0 0 wwn-0x5000cca254e4c80d ONLINE 0 0 0 wwn-0x5000cca254e15a33 ONLINE 0 0 0 wwn-0x5000cca23bd9cd83 ONLINE 0 0 0 wwn-0x5000cca254e4da90 ONLINE 0 0 0 raidz1-2 ONLINE 0 0 12 wwn-0x5000cca23bda11df ONLINE 0 0 0 wwn-0x5000cca23bd96da0 ONLINE 0 0 0 wwn-0x5000cca254e44ee8 ONLINE 0 0 0 wwn-0x5000cca254e4e5f3 ONLINE 0 0 0 wwn-0x5000cca23bda0b91 ONLINE 0 0 0 wwn-0x5000cca23bd9e2fb ONLINE 0 0 0 wwn-0x5000cca23bd9cf02 ONLINE 0 0 0 wwn-0x5000cca23bd9f0a0 ONLINE 0 0 0 raidz1-3 ONLINE 0 0 12 wwn-0x5000cca23bd9dcf4 ONLINE 0 0 0 wwn-0x5000cca23bd9b599 ONLINE 0 0 0 wwn-0x5000cca23bd9afd5 ONLINE 0 0 0 wwn-0x5000cca254e1e1c7 ONLINE 0 0 0 wwn-0x5000c500916fdc96 ONLINE 0 0 0 wwn-0x5000cca254df67ff ONLINE 0 0 0 wwn-0x5000cca254dfcbbe ONLINE 0 0 0 wwn-0x5000cca254df66e3 ONLINE 0 0 0 raidz1-4 ONLINE 0 0 8 wwn-0x5000cca254e4e864 ONLINE 0 0 0 wwn-0x5000cca254e4e1d0 ONLINE 0 0 0 wwn-0x5000cca254de02bc ONLINE 0 0 0 wwn-0x5000cca254e4b119 ONLINE 0 0 0 wwn-0x5000cca254df6797 ONLINE 0 0 0 wwn-0x5000cca254e1e1d9 ONLINE 0 0 0 wwn-0x5000cca254e4e5fc ONLINE 0 0 0 wwn-0x5000cca254e400e3 ONLINE 0 0 0 raidz1-5 ONLINE 0 0 0 wwn-0x5000cca254e4e1e6 ONLINE 0 0 0 wwn-0x5000cca254df675b ONLINE 0 0 0 wwn-0x5000cca254e48bae ONLINE 0 0 0 wwn-0x5000cca254e15cc2 ONLINE 0 0 0 wwn-0x5000cca254df8484 ONLINE 0 0 0 wwn-0x5000cca254dfa556 ONLINE 0 0 0 wwn-0x5000c50093639a7b ONLINE 0 0 0 wwn-0x5000cca23bd99add ONLINE 0 0 0 raidz1-6 ONLINE 0 0 12 wwn-0x5000cca254df5930 ONLINE 0 0 0 wwn-0x5000cca254e4e197 ONLINE 0 0 0 wwn-0x5000cca254e485ff ONLINE 0 0 0 wwn-0x5000c500b006ed04 ONLINE 0 0 0 wwn-0x5000cca23bd95bfd ONLINE 0 0 0 wwn-0x5000cca254e158cd ONLINE 0 0 0 wwn-0x5000cca254e17ea9 ONLINE 0 0 0 wwn-0x5000cca23bd980bb ONLINE 0 0 0 raidz1-7 ONLINE 0 0 12 wwn-0x5000cca254e48b4d ONLINE 0 0 0 wwn-0x5000cca254e1c2bc ONLINE 0 0 0 wwn-0x5000cca254df5019 ONLINE 0 0 0 wwn-0x5000cca254e4e1d6 ONLINE 0 0 0 wwn-0x5000cca254e4e18d ONLINE 0 0 0 wwn-0x5000cca254e4b0f7 ONLINE 0 0 0 wwn-0x5000cca254df6e60 ONLINE 0 0 0 wwn-0x5000cca254df7102 ONLINE 0 0 0 raidz1-8 ONLINE 0 0 12 wwn-0x5000cca254e14a51 ONLINE 0 0 0 wwn-0x5000cca254df66f9 ONLINE 0 0 0 wwn-0x5000cca254e0a0bd ONLINE 0 0 0 wwn-0x5000cca254e4e23b ONLINE 0 0 0 wwn-0x5000cca254e4c7a7 ONLINE 0 0 0 wwn-0x5000cca254df5902 ONLINE 0 0 0 wwn-0x5000cca23bca301b ONLINE 0 0 0 wwn-0x5000cca254e4b117 ONLINE 0 0 0 raidz1-9 ONLINE 0 0 8 wwn-0x5000cca254e4253b ONLINE 0 0 0 wwn-0x5000cca254df6807 ONLINE 0 0 0 wwn-0x5000c50093637383 ONLINE 0 0 0 wwn-0x5000cca254e4e2fd ONLINE 0 0 0 wwn-0x5000cca254e1e1db ONLINE 0 0 0 wwn-0x5000cca254df8379 ONLINE 0 0 0 wwn-0x5000cca254e4e1e5 ONLINE 0 0 0 wwn-0x5000cca254df5984 ONLINE 0 0 0 raidz1-10 ONLINE 0 0 0 wwn-0x5000cca254dfc4e5 ONLINE 0 0 0 wwn-0x5000cca254dfc5f8 ONLINE 0 0 0 wwn-0x5000cca23bda0d2d ONLINE 0 0 0 wwn-0x5000cca23bd9b44c ONLINE 0 0 0 wwn-0x5000cca23bd9dda2 ONLINE 0 0 0 wwn-0x5000cca254e4af06 ONLINE 0 0 0 wwn-0x5000cca254e4b11a ONLINE 0 0 0 wwn-0x5000cca254e48d30 ONLINE 0 0 0 cache nvme0n1 ONLINE 0 0 0 spares wwn-0x5000cca254df67bb AVAIL
wwn-0x5000cca254e49595 AVAIL

errors: Permanent errors have been detected in the following files:

    <0x128>:<0x24480b>
    <0x128>:<0x23c310>
    <0x128>:<0x23c41a>
    <0x128>:<0x31551b>
    <0x128>:<0x31551d>
    <0x128>:<0x2bfc9a>
    <0x128>:<0x2637a5>
    <0x128>:<0x231ca8>
    <0x128>:<0x23bbd1>
    <0x128>:<0x210fd5>
    <0x128>:<0x2447f0>
    /teraraid3//path/redacted//2014-06-22_02_05_59_noDW.mrc
    /teraraid3//path/redacted//MRC_0107/2014-06-22_02_05_59_noDW.mrc
    dstore1//path/redacted//MRC_1107/2014-07-12_02.41.04.mrcs
    dstore1/teraraid3@28Feb-00:05://path/redacted//2014-07-13_13.34.21.mrcs
    dstore1/teraraid3@28Feb-00:05://path/redacted//02061_cor2_DW.mrc
    dstore1/teraraid3@28Feb-00:05://path/redacted//02062_cor2_DW_pf.mrc
    dstore1/teraraid3@28Feb-00:05://path/redacted//FoilHole_Fractions.mrcs
    dstore1/teraraid3@28Feb-00:05://path/redacted//Falcon_2018_04_07-05_38_00_0.mrcs
    dstore1/teraraid3@28Feb-00:05://path/redacted//RAW/test_192.mrc
    dstore1/teraraid3@28Feb-00:05://path/redacted//2014-08-17_02.06.40.mrcs
    dstore1/teraraid3@28Feb-00:05://path/redacted//raw/171213_00146.mrcs
    dstore1/teraraid3@28Feb-00:05://path/redacted//2014-06-21_22_57_22_noDW.mrc

These "CHKSUM errors", and "Permanent errors in the following files" were first noticed in April 2018, and continue to occur.

Our initial assumption was that this was due to bad hardware, drivers or firmware. As a result, we swapped out the SAS backplanes, SAS cables, SAS expanders - and changed the main SAS card from an Adaptec 8805 to an LSI 9400-8i and installed all available firmware updates.

Each time we made a hardware change, we would: delete the corrupt file & restore it from a remote snapshot backup run zpool clear run zpool scrub

Sometimes the pool would be OK - but then a few weeks later we would again see CHKSUM errors, and one or more bad files and/or a list of bad inodes, eg:

errors: Permanent errors have been detected in the following files:

    <0x21a>:<0x1ea03a>
    <0x1521>:<0x348e25>
    <0x1521>:<0x311fc1>
    <0x534>:<0x4d6fe0>
    <0x534>:<0x5201ea>

We have also tried upgrading the OS from Scientific Linux 6.9 to SL 7.5, then 7.6 and have upgraded ZFS from 0.6.5.9-1 eventually through to 0.7.12-1 (kmod)

All three servers are Dell DSS7500, each with 90 SATA drives two servers have 8TB drives (mostly HUH728080AL) one has 12TB (all HUH721212AL)

Using zdb to looking at one of the corrupt files: root@dstore1 ~]# stat -c'%i' /file/name/redacted/xyz 2377712 [root@dstore1 ~]# zdb -bbbbb dstore1 2377712 Dataset mos [META], ID 0, cr_txg 4, 3.30G, 1533 objects

Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type

zdb: dmu_bonus_hold(2377712) failed, errno 2

After forcing dd to read a corrupt file, the CHKSUM counter for one VDEV increases dramatically, raidz1-9 ONLINE 0 0 3.50K wwn-0x5000cca254e4253b ONLINE 0 0 0 wwn-0x5000cca254df6807 ONLINE 0 0 0 wwn-0x5000c50093637383 ONLINE 0 0 0 wwn-0x5000cca254e4e2fd ONLINE 0 0 0 wwn-0x5000cca254e1e1db ONLINE 0 0 0 wwn-0x5000cca254df8379 ONLINE 0 0 0 wwn-0x5000cca254e4e1e5 ONLINE 0 0 0 wwn-0x5000cca254df5984 ONLINE 0 0 0

"zpool events -v" shows a great deal - due to it's size I have copied the contents to: http://p.ip.fi/wwco

All and any ideas appreciated.

Jake

rincebrain commented 5 years ago

The other idea I had was that here, we saw people getting vdev errors without many checksum errors on the associated disks, because the raidz parity code on ARM32 was broken.

So maybe try posting /proc/spl/kstat/zfs/vdev_raidz_bench and then changing /sys/module/zfs/parameters/zfs_vdev_raidz_impl to the other values (I'd start with original, painfully slow though it may comparatively be), then see if you can reproduce this with each of those values.

(That said, this idea wouldn't explain the OP's issue on 0.6.5, because the raidz modifications didn't go in until 0.7.X, but it's something to try.)

(God I'm hoping this isn't another #6981)

capnbb commented 5 years ago

Hi Rincebrain, many thanks for your help.

It might be a 0.7.X issue

We first saw problems on a machine running 0.6.5, however these errors may have been caused by a different problem, i.e. a bad SAS expander (there were errors in /var/log/messages which we don't see here).

However the current errors first appeared on this server, after it's SAS backplane & expanders were replaced, as we upgraded ZFS from 6.5 to 7.12 during the down time.

During the first post-update & repair scrub, ZFS found lots of corrupt files, together with CHKSUM errors in vdevs, but no CHKSUM errors in drives., ZFS also refused to allow us to remove the hot-spare drives that had been pulled into the array.

In response, we decided the array should be destroyed and rebuilt from scratch. But before doing this, we needed to copy 425TB data to another server.

At this point the main server was still live, and pushing daily snapshots to it's backup server, which was running ZFS 7.12 & Scientific Linux 6.8

To prepare for the move, we ran a zpool push from the backup server to a new DSS7500 ZFS server, running SL 7.6 & ZFS 7.12. This push failed due to CHKSUM errors in the pool on the backup server.

Here is the output of zpool status on the backup machine running ZFS 7.12.

pool: dmirror1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A scan: scrub in progress since Sat Dec 22 15:03:39 2018 186T scanned out of 520T at 1.24G/s, 76h32m to go 0B repaired, 35.84% done config:

    NAME                        STATE     READ WRITE CKSUM
    dmirror1                    ONLINE       0     0    49
      raidz1-0                  ONLINE       0     0    12
        wwn-0x5000cca23bd7b212  ONLINE       0     0     0
        wwn-0x5000cca23bd89099  ONLINE       0     0     0
        wwn-0x5000cca23bd899ce  ONLINE       0     0     0
        wwn-0x5000cca23bd8f1ea  ONLINE       0     0     0
        wwn-0x5000cca23bd93fe3  ONLINE       0     0     0
        wwn-0x5000cca23bd9a29e  ONLINE       0     0     0
        wwn-0x5000cca23bd9a389  ONLINE       0     0     0
        wwn-0x5000cca23bd9bc04  ONLINE       0     0     0
        wwn-0x5000cca23bd9c1a6  ONLINE       0     0     0
        wwn-0x5000cca23bd9c27d  ONLINE       0     0     0
        wwn-0x5000cca23bd9c7b3  ONLINE       0     0     0
      raidz1-1                  ONLINE       0     0    18
        wwn-0x5000cca23bd9cc99  ONLINE       0     0     0
        wwn-0x5000cca23bd9cceb  ONLINE       0     0     0
        wwn-0x5000cca23bd9cd63  ONLINE       0     0     0
        wwn-0x5000cca23bd9cd81  ONLINE       0     0     0
        wwn-0x5000cca23bd9d65e  ONLINE       0     0     0
        wwn-0x5000cca23bd9dc1a  ONLINE       0     0     0
        wwn-0x5000cca23bd9dc38  ONLINE       0     0     0
        wwn-0x5000cca23bd9dcc2  ONLINE       0     0     0
        wwn-0x5000cca23bda014d  ONLINE       0     0     0
        wwn-0x5000cca23bda0275  ONLINE       0     0     0
        wwn-0x5000cca23bda0918  ONLINE       0     0     0
      raidz1-2                  ONLINE       0     0     8
        wwn-0x5000cca23bda0938  ONLINE       0     0     0
        wwn-0x5000cca23bda0b8b  ONLINE       0     0     0
        wwn-0x5000cca23bda217a  ONLINE       0     0     0
        wwn-0x5000cca254dc603d  ONLINE       0     0     0
        wwn-0x5000cca254de90ca  ONLINE       0     0     0
        wwn-0x5000cca254df57a0  ONLINE       0     0     0
        wwn-0x5000cca254df58ca  ONLINE       0     0     0
        wwn-0x5000cca254df5918  ONLINE       0     0     0
        wwn-0x5000cca254df5ec4  ONLINE       0     0     0
        wwn-0x5000cca254df6902  ONLINE       0     0     0
        wwn-0x5000cca254df6c59  ONLINE       0     0     0
      raidz1-3                  ONLINE       0     0     0
        wwn-0x5000cca254df6d87  ONLINE       0     0     0
        wwn-0x5000cca254df6db3  ONLINE       0     0     0
        wwn-0x5000cca254df84c3  ONLINE       0     0     0
        wwn-0x5000cca254df851e  ONLINE       0     0     0
        wwn-0x5000cca254df9509  ONLINE       0     0     0
        wwn-0x5000cca254dfafe3  ONLINE       0     0     0
        wwn-0x5000cca254dfc099  ONLINE       0     0     0
        wwn-0x5000cca254dfcab9  ONLINE       0     0     0
        wwn-0x5000cca254dfcaca  ONLINE       0     0     0
        wwn-0x5000cca254dfcbbc  ONLINE       0     0     0
        wwn-0x5000cca254dfcef6  ONLINE       0     0     0
      raidz1-4                  ONLINE       0     0    18
       wwn-0x5000cca254dfcfdb  ONLINE       0     0     0
        wwn-0x5000cca254dfd9a3  ONLINE       0     0     0
        wwn-0x5000cca254e14cd9  ONLINE       0     0     0
        wwn-0x5000cca254e191cb  ONLINE       0     0     0
        wwn-0x5000cca254e1daf6  ONLINE       0     0     0
        wwn-0x5000cca254e2e194  ONLINE       0     0     0
        wwn-0x5000cca254e43d93  ONLINE       0     0     0
        wwn-0x5000cca254e485c7  ONLINE       0     0     0
        wwn-0x5000cca254e485f4  ONLINE       0     0     0
        wwn-0x5000cca23bc63e3e  ONLINE       0     0     0
        wwn-0x5000cca254e48606  ONLINE       0     0     0
      raidz1-5                  ONLINE       0     0    12
        wwn-0x5000cca254e4872b  ONLINE       0     0     0
        wwn-0x5000cca254e48909  ONLINE       0     0     0
        wwn-0x5000cca254e48b50  ONLINE       0     0     0
        wwn-0x5000cca254e48b7c  ONLINE       0     0     0
        wwn-0x5000cca254e48b95  ONLINE       0     0     0
        wwn-0x5000cca254e48bd9  ONLINE       0     0     0
        wwn-0x5000cca254e48c22  ONLINE       0     0     0
        wwn-0x5000cca254e49596  ONLINE       0     0     0
        wwn-0x5000cca254e495d2  ONLINE       0     0     0
        wwn-0x5000cca254e495d7  ONLINE       0     0     0
        wwn-0x5000cca254e495d9  ONLINE       0     0     0
      raidz1-6                  ONLINE       0     0    12
       wwn-0x5000cca254e4b02f  ONLINE       0     0     0
        wwn-0x5000cca254e4b10e  ONLINE       0     0     0
        wwn-0x5000cca254e4b114  ONLINE       0     0     0
        wwn-0x5000cca254e4c0fd  ONLINE       0     0     0
        wwn-0x5000cca254e4c2f8  ONLINE       0     0     0
        wwn-0x5000cca254e4c347  ONLINE       0     0     0
        wwn-0x5000cca254e4c668  ONLINE       0     0     0
        wwn-0x5000cca254e4d630  ONLINE       0     0     0
        wwn-0x5000cca254e4da89  ONLINE       0     0     0
        wwn-0x5000cca254e4e1d1  ONLINE       0     0     0
        wwn-0x5000cca254e4e1e4  ONLINE       0     0     0
      raidz1-7                  ONLINE       0     0    18
        wwn-0x5000cca254e4e216  ONLINE       0     0     0
        wwn-0x5000cca254e4e24d  ONLINE       0     0     0
        wwn-0x5000cca254e4e3a5  ONLINE       0     0     0
        wwn-0x5000cca254e4e3d2  ONLINE       0     0     0
        wwn-0x5000cca254e4e8c5  ONLINE       0     0     0
        wwn-0x5000cca254e4e8cf  ONLINE       0     0     0
        wwn-0x5000cca254e4e91d  ONLINE       0     0     0
        wwn-0x5000cca254e4e922  ONLINE       0     0     0
        wwn-0x5000cca254e4e9ce  ONLINE       0     0     0
        wwn-0x5000cca254e501fb  ONLINE       0     0     0
        wwn-0x5000cca254e52300  ONLINE       0     0     0
    cache
      nvme0n1                   ONLINE       0     0     0
    spares
      wwn-0x5000cca23bc65845    AVAIL   
      wwn-0x5000c500916ec52f    AVAIL   

errors: Permanent errors have been detected in the following files:

    dmirror1/teraraid3@02Oct-00:05:/path_redacted/Fila__0263_movie.mrcs
    dmirror1/teraraid3@02Oct-00:05:/path_redacted/584120_2584121_20180604-87883.mrcs
    dmirror1/teraraid3@02Oct-00:05:/path_redacted/71229_1353_Fractions_DW_movie.mrcs
    dmirror1/teraraid3@02Oct-00:05:/path_redacted/0705_2338-9995_movie.mrcs
    dmirror1/teraraid3@02Oct-00:05:/path_redacted/16A_LMNG_Fab/1520_2x_df_DW.mrcs
    dmirror1/teraraid3@02Oct-00:05:/path_redacted/Micrographs/0170426_0435g3.mrcs
    dmirror1/teraraid3@02Oct-00:05:/path_redacted/20180125_0322_Fractions_movie.mrcs

We forced the push to the new server using: echo 1 > /sys/module/zfs/parameters/zfs_send_corrupt_data ignore

After this we pushed a differential snaphost from the old sever to the new, bringing the filesystem copy into sync with the live one, then made the copy server live, then brought the old server down and wiped the bad array. Once the array was rebuilt, we pushed the data back from the live server, and then pivoted back to using the original server. This took over a month to do.

We then noticed "vdev errors without any checksum errors on associated disks" occurring on the the new server, the server we had just re-built, it's backup server all of these ZFS servers.

At this point, all four servers, (two primary , two backup), had been upgraded to the latest ZFS 7.12. and Scientific Linux 7.6.

Two of these servers have dual Xeon E5-2680v4, the other two have dual Xeon E5-2695v4

Here is the output requested:

[root@dstore1 ~]# cat /proc/spl/kstat/zfs/vdev_raidz_bench 17 0 0x01 -1 0 53762471298 850737046886741 implementation gen_p gen_pq gen_pqr rec_p rec_q rec_r rec_pq rec_pr rec_qr rec_pqr
original 348506398 162062775 61442186 1193774790 221297813 19380631 81479501 10622347 13474653 12587040
scalar 1187678597 266462264 135555733 1253629508 408201033 200970522 178199213 103503361 76136099 56220996
sse2 1980482354 825974391 416699828 2221067277 689471895 532100756 369130071 209088501 200127045 95878136
ssse3 2094576425 698717100 401268400 2243657978 1179016942 845901365 613309447 441476197 427744380 325053227
avx2 3203730779 690211759 653176379 3597621625 2153857366 1445933833 948119534 799440637 756771858 578373851
fastest avx2 sse2 avx2 avx2 avx2 avx2 avx2 avx2 avx2 avx2

[root@dstore1 ~]# cat /sys/module/zfs/parameters/zfs_vdev_raidz_impl [fastest] original scalar sse2 ssse3 avx2

as suggested, I have now switched to original. [root@dstore1 ~]# echo "original" > /sys/module/zfs/parameters/zfs_vdev_raidz_impl

I have issued a zpool clear, and started a new scrub, I'll report back when this completes, or if further errors arise...

thanks again,

Jake

ryao commented 5 years ago

This is just a stab in the dark, but is your equipment properly grounded? The last time I encountered someone with this many problems, it turned out that none of his equipment was grounded and every machine that he had was malfunctioning. The Windows ones creasing had been assumed to be normal behavior while the Linux ones using other Linux filesystems just reported that everything was okay, while ZFS threw a fit. I believe that fixing his grounding made the problem go away.

stephan2012 commented 5 years ago

As mentioned earlier, I was thinking that disabling the write cache on my disks resolved the errors. But this statement was probably somewhat too early. However, after running for around two months with an updated kernel (went from Debian GNU/Linux 9 provided 4..9 to the backports kernel 4.19) I haven't seen any errors anymore on three identical servers.

# zpool status
  pool: bulkpool
 state: ONLINE
  scan: scrub repaired 0B in 8h40m with 0 errors on Sun Feb 10 09:04:20 2019
config:

        NAME        STATE     READ WRITE CKSUM
        bulkpool    ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0h57m with 0 errors on Sun Feb 10 01:21:28 2019
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda1    ONLINE       0     0     0
            sdb1    ONLINE       0     0     0

errors: No known data errors

Maybe it helps…

gordan-bobic commented 5 years ago

It may be far more relevant to have the last triplet of the kernel version number than the first two, i.e. the x in 4.9.x and 4.19.x.

stephan2012 commented 5 years ago

Full version:

# uname -a
Linux n0041 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1 (2019-02-07) x86_64 GNU/Linux

4.19.0-0.bpo.1 was fine, too.

capnbb commented 5 years ago

Hi All, many thanks for helpful ideas :)

The scrub completed over the weekend, unfortunately there are still errors (see below)

To answer questions put... 1) grounding seems normal; under 0.1 Ohm to earth, as with our other servers. 2) kernel version is 3.10.0-957.5.1.el7.x86_64 3) write cache is enabled on all drives

My next step will be to disable the disk write cache, and re-scrub. I'll report back.

One other thought: could this be caused by creating and deleting snapshots? We take a snapshot daily at 00:05, then push a differential between this and the previous nights snapshot to a remote server using mbuffer, after which we delete the previous nights snapshot. Anything between 50MB and 5TB gets pushed nightly.

Output of "zpool events -v" is here: http://p.ip.fi/ACsh

[root@dstore1 ~]# zpool status -v pool: dstore1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A scan: scrub repaired 0B in 47h8m with 23 errors on Sun Mar 3 11:16:27 2019 config:

(Drive lines omitted - all have 0 for READ WRITE CKSUM) full text here: http://p.ip.fi/nK7_

NAME                        STATE     READ WRITE CKSUM
dstore1                     ONLINE       0     0    23
  raidz1-0                  ONLINE       0     0     4
  raidz1-1                  ONLINE       0     0     4
  raidz1-2                  ONLINE       0     0     6
  raidz1-3                  ONLINE       0     0     6
  raidz1-4                  ONLINE       0     0     4
  raidz1-5                  ONLINE       0     0     0
  raidz1-6                  ONLINE       0     0     6
  raidz1-7                  ONLINE       0     0     6
  raidz1-8                  ONLINE       0     0     6
  raidz1-9                  ONLINE       0     0     4
  raidz1-10                 ONLINE       0     0     0
cache
  nvme0n1                   ONLINE       0     0     0
spares
  wwn-0x5000cca254df67bb    AVAIL   
  wwn-0x5000cca254e49595    AVAIL   

errors: Permanent errors have been detected in the following files:

    <0x52b>:<0x31551b>
    <0x52b>:<0x31551d>
    <0x52b>:<0x2bfc9a>
    <0xc8b>:<0x24480b>
    <0xc8b>:<0x23c310>
    <0xc8b>:<0x23c41a>
    <0xc8b>:<0x2637a5>
    <0xc8b>:<0x231ca8>
    <0xc8b>:<0x23bbd1>
    <0xc8b>:<0x210fd5>
    <0xc8b>:<0x2447f0>

Thanks again, Jake

stephan2012 commented 5 years ago

One other thought: could this be caused by creating and deleting snapshots?

No. :-)

capnbb commented 5 years ago

Hi all,

The server finished attempting to scrub it's pool with with disk write cache disabled, and with /sys/module/zfs/parameters/zfs_vdev_raidz_impl set to [original].

Unfortunately, there are still: 23 CHKSUM errors in the pool zero CHKSUM errors in the drives 9 data errors in files / inodes.

This is exactly the same number of CHKSUM errors as seen in the last scrub after doing a "zpool clear". The list of files / inodes with errors changes slightly on each scrub; sometimes it's a mixed list of real file names and inodes, right now it's just inodes.

Are the inodes the same ones listed in each scrub?

the last scrub shows this inode as corrupt

<0x4a7>:<0x2bfc9a> the previous scrub showed this inode as corrupt: <0x52b>:<0x2bfc9a> and a prior scrub showed this one as corrupt: <0x128>:<0x2bfc9a> if these are all the same inode, we have 9 corrupt inodes that zpool scrub consistently fails to repair. Anyhow, after the last scrub finished, zpool status (omitting drives lines) is: pool: dstore1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A scan: scrub repaired 0B in 51h50m with 21 errors on Wed Mar 6 16:01:48 2019 config: NAME STATE READ WRITE CKSUM dstore1 ONLINE 0 0 21 raidz1-0 ONLINE 0 0 4 raidz1-1 ONLINE 0 0 4 raidz1-2 ONLINE 0 0 6 raidz1-3 ONLINE 0 0 6 raidz1-4 ONLINE 0 0 4 raidz1-5 ONLINE 0 0 0 raidz1-6 ONLINE 0 0 6 raidz1-7 ONLINE 0 0 6 raidz1-8 ONLINE 0 0 6 raidz1-9 ONLINE 0 0 0 raidz1-10 ONLINE 0 0 0 cache nvme0n1 ONLINE 0 0 0 spares wwn-0x5000cca254df67bb AVAIL wwn-0x5000cca254e49595 AVAIL errors: Permanent errors have been detected in the following files: <0x4>:<0x23c310> <0x4>:<0x23c41a> <0x4>:<0x2637a5> <0x4>:<0x231ca8> <0x4>:<0x23bbd1> <0x4>:<0x210fd5> <0x4a7>:<0x31551b> <0x4a7>:<0x31551d> <0x4a7>:<0x2bfc9a> Any further ideas gratefully received, Jake
capnbb commented 5 years ago

Hi ptx0, No problem, I'll open a new ticket. Thanks for your help, Jake