openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.68k stars 1.75k forks source link

critical bug: zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition #9105

Open olivier-klein opened 5 years ago

olivier-klein commented 5 years ago

System information

Type Version/Name
Distribution Name MANJARO
Distribution Version 18.0.4
Linux Kernel 4.19.60-1
Architecture amd64
ZFS Version 0.8.1-1
SPL Version 0.8.1-1

Describe the problem you're observing

Critical bug: zpool has modified the signature table and occupy now the whole disk /dev/nvme0m1 instead of being constrained to one partition: /dev/nvme0m1p5

lsblk -f

NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT nvme0m1 zfs_member tank 959334055019102200 ├─nvme0m1p1 vfat ESP F8E8-2918 738,1M 5% /boot/efi ├─nvme0m1p2 vfat OS 5224-C2FA ├─nvme0m1p3 ext4 UBUNTU bed2f845-754b-477b-8bdb-3cba7d56fae3 ├─nvme0m1p4 ext4 MANJARO 3134ceb0-795e-4f51-a6fb-ba172fac0312 75,5G 16% / └─nvme0m1p5 zfs_member tank 9593340550191022900

lsblk -a

nvme0m1 259:0 0 953,9G 0 disk ├─nvme0m1p1 259:1 0 780M 0 part /boot/efi ├─nvme0m1p2 259:2 0 5G 0 part ├─nvme0m1p3 259:3 0 97,7G 0 part ├─nvme0m1p4 259:4 0 97,7G 0 part / └─nvme0n1p5 259:5 0 752,8G 0 part

blkid /dev/nvme0m1

/dev/nvme0m1: LABEL="tank" UUID="9593340550191022900" UUID_SUB="541976190045946664" TYPE="zfs_member" PTUUID="e7762bd0-453e-4900-b428-26f1b11c22b5" PTTYPE="gpt"

Describe how to reproduce the problem

Followed instructions on https://wiki.archlinux.org/index.php/ZFS zpool was created with id from ls -lh /dev/disk/by-id/

sudo zpool create -f -o ashift=13 -m /mnt/tank tank nvmePC401_NVMe_SK_hynix_1TB_MI93T003810403E62-part5

NOTE that zpool was mounted as default to occupy the whole partition (i.e. w/o redundancy or raid0)

Interestingly gparted (thus disk signature) showed correctly the partition table after installation. Everything got messed up after enabling zfs.target zfs-mount zfs-import.target zfs-import-cache and reboot.

Include any warning/errors/backtraces from the system logs

This is a critical issue. boot log now messed up

juil. 31 07:07:37 XPS13 systemd[1]: systemd-firstboot.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start First Boot Wizard. juil. 31 07:07:37 XPS13 systemd[1]: systemd-sysusers.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Create System Users. juil. 31 07:07:37 XPS13 systemd[1]: systemd-fsck-root.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start File System Check on Root Device. juil. 31 07:07:37 XPS13 systemd[1]: systemd-binfmt.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Set Up Additional Binary Formats. juil. 31 07:07:37 XPS13 systemd[1]: systemd-guest-user.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: systemd-guest-user.service: Failed with result 'start-limit-hit'. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start systemd-guest-user.service. juil. 31 07:07:37 XPS13 systemd[1]: systemd-hwdb-update.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Rebuild Hardware Database. juil. 31 07:07:37 XPS13 systemd[1]: sys-fs-fuse-connections.mount: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to mount FUSE Control File System. juil. 31 07:07:37 XPS13 systemd-udevd[300]: Process '/usr/bin/alsactl restore 0' failed with exit code 99.

olivier-klein commented 5 years ago

There is a serious bug affecting zfs 0.8-1.1 (tested on latest manjaro running on linux kernel 4.19). This bug has been reported in different forums under different context.

https://gitlab.gnome.org/GNOME/gparted/issues/14

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888114

https://bbs.archlinux.org/viewtopic.php?id=206121

https://bbs.archlinux.org/viewtopic.php?id=202587

Any help on how to wipe clean the signature block information on /dev/nvme0m1 would be welcome. I have not tried yet zpool clearlabel as, as far I understand, it would wipe out the entire disk.

GregorKopka commented 5 years ago

The problem is that blkid, when looking at the whole disk, sees the ZFS uberblocks at the end of the nvme0m1p5 partition (which also is the end of the disk) and then thinks that the whole disk must be a zfs member. It's wrong with that.

It's also a problem for zpool import which could fall for the same problem: seeing the uberblocks at the end when looking as /dev/nvme0m1, then failing to import as these point to garbage when one counts sectors from the beginning of the drive instead of the beginning of the partition.

The solution to this is a small, empty partition (some 10 MiB) at the end of the drive (´zpool create`, when given whole drives, does this by creating a small 'partition 9' at the end) so blkid and zpool import won't see the uberblocks at the end of the actual zfs partition when looking at the whole disk (as they'll instead see the empty space of partition 9).

Do not operate zpool labelclear on the whole drive, it will not solve the problem (as the uberblocks will be rewritten, round-robin, on every txg) but has a fair chance to destroy your pool and even the whole partition table (including the backup at the end of the drive).

Best option is to backup the contents of the pool, destroy it, reduce the size of the last partition by some 10-20 MiB, create a partition at the end that protects this free space (and dd if=/dev/zero that one, to get rid of the uberblocks in that area), then recreate the pool and restore the backup.

olivier-klein commented 5 years ago

Do you think that dd if=/dev/zero of=/dev/nvme0n1p6 bs=512 count=50 will be enough to wipe out the uberblocks of the last empty partition?

GregorKopka commented 5 years ago

In case nvme0n1p6 is the new (protection) partition you just created the dd dosn't need the count option, it'll stop when reaching the end of the partition (= when done filling it completely with zeros, getting rid of whatever might have been there before). Just make sure to specify the right of ;)

dankamongmen commented 4 years ago

I also ran into this problem in my growlight project: https://github.com/dankamongmen/growlight/issues/4

I filed a bug against upstream, but have heard nothing (filed 2019-08): https://www.spinics.net/lists/util-linux-ng/msg15811.html

I detail how I worked around it here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888114 (last comment) and in the growlight issue linked above

stale[bot] commented 3 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

GregorKopka commented 3 years ago

Has this been fixed?

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

GregorKopka commented 2 years ago

Stale bot should not close defects.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

GregorKopka commented 1 year ago

@behlendorf defect or not?

ZLima12 commented 9 months ago

Bumping this because it seems like a terrible data corruption bug that needs to be fixed.

mfleetwo commented 9 months ago

Upstream root cause and fix:

ZLima12 commented 9 months ago

I see, so zfs never actually touched the partition table at all. Either way, glad it's fixed, and this issue should be closed.