openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.48k stars 1.74k forks source link

broken dataset #7996

Closed edef1c closed 3 years ago

edef1c commented 5 years ago

System information

Type Version/Name
Distribution Name NixOS
Distribution Version 19.03.git.0a7e258012b (Koi)
Linux Kernel 4.14.74
Architecture x86_64
ZFS Version 0.8.0-rc1
SPL Version 0.8.0-rc1

Describe the problem you're observing

I've got a zvol that breaks zfs list and zfs send, can't be rolled back, and doesn't show up in /dev/zvol. It's unfortunately the last backup of a zvol on a dying box, so I haven't tried destroying it yet.

Describe how to reproduce the problem

I'm not quite sure, but here's how it manifests:

[root@spock:~]# zdb -d rpool/panther
Dataset rpool/panther [ZVOL], ID 115, cr_txg 2869576, 9.79M, 2 objects
[root@spock:~]# zfs list rpool/panther
cannot open 'rpool/panther': I/O error

Include any warning/errors/backtraces from the system logs

i'm unsure how to pick things out from the noise well, but these seem directly correlated:

zap_leaf.c:49:zap_entry_read(): error 75
zap_leaf.c:513:zap_entry_read_name(): error 75
GregorKopka commented 5 years ago

@edef1c have you tried to import the pool (readonly) with an older version?

edef1c commented 5 years ago

it seems the last snapshot isn't quite what it should be, but my data is still there!

[root@spock:~]# zdb -ddddd rpool/panther@snap-2018_05_24-09:30:07 2
Dataset rpool/panther@snap-2018_05_24-09:30:07 [ZVOL], ID 266883, cr_txg 19001825, 9.79M, 2 objects, rootbp DVA[0]=<0:27479b56c00:200> DVA[1]=<0:a80dba1200:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=800L/200P birth=19001825L/19001825P fill=2 cksum=7b401de16:33e22a8e538:b19b3d9d733f:19c09fb3cf8ba1
    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         2    1   128K    512      0     512    512    0.00  zvol prop
        dnode flags:
        dnode maxblkid: 0
Indirect blocks:
zdb -ddddd rpool/panther@snap-2018_05_24-09:00:01 2
Dataset rpool/panther@snap-2018_05_24-09:00:01 [ZVOL], ID 300083, cr_txg 19001505, 80.8G, 2 objects, rootbp DVA[0]=<0:1afc9fa4400:200> DVA[1]=<0:2750990ec00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=800L/200P birth=19001505L/19001505P fill=2 cksum=967fe9fcc:3f887279157:d9f85a1d98ef:1fa7a661156bf4
    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         2    1    16K    512      0     512    512  100.00  zvol prop
        dnode flags: USED_BYTES
        dnode maxblkid: 0
        microzap: 512 bytes, 1 entries
                size = 85899345920
Indirect blocks:
               0 L0 EMBEDDED et=0 200L/2bP B=2872155
                segment [0000000000000000, 0000000000000200) size   512
[root@spock:~]# zfs clone rpool/panther@snap-2018_05_24-09:00:01 rpool/panther-clone
[root@spock:~]# file -Ls /dev/zvol/rpool/panther-clone
/dev/zvol/rpool/panther-clone: DOS/MBR boot sector; partition 1 : ID=0x83, active, start-CHS (0x44,1,4), end-CHS (0x155,1,5), startsector 2048, 167770112 sectors
GregorKopka commented 5 years ago

Is the 'yay, data still there' with an older version? And if so, which?

edef1c commented 5 years ago

@GregorKopka nope, same version. I think this was caused by an aborted zfs recv.

GregorKopka commented 5 years ago

Please share details about the pool ( zpool status -v) and the system (ECC RAM, what do the drives think about their state etc).

Because if this is the result of an aborted recv it would be quite critical, in case it stems from a hardware defect (drives of a non-redundant pool simply dying) it would just be a case of bad luck...

edef1c commented 5 years ago

The drives seem .. fine, they perform stupendously well, I'll pull SMART data when I get a chance?

[root@spock:~]# zpool status -v
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 09:01:29 with 0 errors on Wed Oct 10 17:51:48 2018
config:
        NAME                           STATE     READ WRITE CKSUM
        rpool                          ONLINE       0     0     0
          mirror-0                     ONLINE       0     0     0
            ata-ST33000651AS_Z2912FC6  ONLINE       0     0     0
            ata-ST33000651AS_Z290XNHM  ONLINE       0     0     0
errors: No known data errors
edef1c commented 5 years ago

Seems I can promote it at least!

[root@spock:~]# zfs promote rpool/panther-clone
[root@spock:~]# zfs list -t snapshot -r rpool/panther-clone
NAME                                           USED  AVAIL  REFER  MOUNTPOINT
rpool/panther-clone@transfer-initial          17.6G      -  51.1G  -
rpool/panther-clone@snap-2016_07_18-05:45:09  8.81G      -  80.6G  -
rpool/panther-clone@snap-2017_09_26-16:19:30  6.23M      -  80.8G  -
rpool/panther-clone@snap-2017_09_26-16:34:43   431K      -  80.8G  -
rpool/panther-clone@snap-2017_09_26-16:35:03   196K      -  80.8G  -
rpool/panther-clone@snap-2017_09_26-16:35:26   248K      -  80.8G  -
rpool/panther-clone@snap-2017_09_26-16:35:46   502K      -  80.8G  -
rpool/panther-clone@snap-2017_09_26-16:37:08  1.00M      -  80.8G  -
rpool/panther-clone@snap-2017_11_15-12:29:59  56.2M      -  80.8G  -
rpool/panther-clone@snap-2017_11_15-16:13:16   939K      -  80.8G  -
rpool/panther-clone@snap-2017_11_15-16:13:57   712K      -  80.8G  -
rpool/panther-clone@snap-2017_11_15-16:17:23   294K      -  80.8G  -
rpool/panther-clone@snap-2017_11_15-16:17:40   448K      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-19:45:03  3.57M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-19:55:06  1.33M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:00:04  1.32M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:10:04  2.00M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:20:03  1.62M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:25:05  1.46M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:30:03  1.57M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:35:03  1.80M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:40:03  1.42M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:45:04  1.56M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:50:04  1.39M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-20:55:03  1.31M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-21:00:06  1.42M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-21:10:06  1.47M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-21:15:13  1.34M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-21:20:09  1.10M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-21:25:10  1.22M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-21:30:13  1.38M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-21:35:07  1.66M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-21:40:09  1.92M      -  80.8G  -
rpool/panther-clone@snap-2017_12_06-21:45:03  12.6M      -  80.8G  -
rpool/panther-clone@snap-2018_05_24-09:00:01     0B      -  80.8G  -

[root@spock:~]# zfs destroy -n rpool/panther
cannot open 'rpool/panther': I/O error
stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.