Open olivier-klein opened 5 years ago
There is a serious bug affecting zfs 0.8-1.1 (tested on latest manjaro running on linux kernel 4.19). This bug has been reported in different forums under different context.
https://gitlab.gnome.org/GNOME/gparted/issues/14
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888114
https://bbs.archlinux.org/viewtopic.php?id=206121
https://bbs.archlinux.org/viewtopic.php?id=202587
Any help on how to wipe clean the signature block information on /dev/nvme0m1 would be welcome. I have not tried yet zpool clearlabel as, as far I understand, it would wipe out the entire disk.
The problem is that blkid, when looking at the whole disk, sees the ZFS uberblocks at the end of the nvme0m1p5 partition (which also is the end of the disk) and then thinks that the whole disk must be a zfs member. It's wrong with that.
It's also a problem for zpool import
which could fall for the same problem: seeing the uberblocks at the end when looking as /dev/nvme0m1, then failing to import as these point to garbage when one counts sectors from the beginning of the drive instead of the beginning of the partition.
The solution to this is a small, empty partition (some 10 MiB) at the end of the drive (´zpool create`, when given whole drives, does this by creating a small 'partition 9' at the end) so blkid and zpool import won't see the uberblocks at the end of the actual zfs partition when looking at the whole disk (as they'll instead see the empty space of partition 9).
Do not operate zpool labelclear on the whole drive, it will not solve the problem (as the uberblocks will be rewritten, round-robin, on every txg) but has a fair chance to destroy your pool and even the whole partition table (including the backup at the end of the drive).
Best option is to backup the contents of the pool, destroy it, reduce the size of the last partition by some 10-20 MiB, create a partition at the end that protects this free space (and dd if=/dev/zero that one, to get rid of the uberblocks in that area), then recreate the pool and restore the backup.
Do you think that dd if=/dev/zero of=/dev/nvme0n1p6 bs=512 count=50 will be enough to wipe out the uberblocks of the last empty partition?
In case nvme0n1p6 is the new (protection) partition you just created the dd dosn't need the count
option, it'll stop when reaching the end of the partition (= when done filling it completely with zeros, getting rid of whatever might have been there before). Just make sure to specify the right of
;)
I also ran into this problem in my growlight project: https://github.com/dankamongmen/growlight/issues/4
I filed a bug against upstream, but have heard nothing (filed 2019-08): https://www.spinics.net/lists/util-linux-ng/msg15811.html
I detail how I worked around it here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888114 (last comment) and in the growlight issue linked above
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
Has this been fixed?
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
Stale bot should not close defects.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
@behlendorf defect or not?
Bumping this because it seems like a terrible data corruption bug that needs to be fixed.
Upstream root cause and fix:
I see, so zfs never actually touched the partition table at all. Either way, glad it's fixed, and this issue should be closed.
System information
Describe the problem you're observing
Critical bug: zpool has modified the signature table and occupy now the whole disk /dev/nvme0m1 instead of being constrained to one partition: /dev/nvme0m1p5
lsblk -f
NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT nvme0m1 zfs_member tank 959334055019102200 ├─nvme0m1p1 vfat ESP F8E8-2918 738,1M 5% /boot/efi ├─nvme0m1p2 vfat OS 5224-C2FA ├─nvme0m1p3 ext4 UBUNTU bed2f845-754b-477b-8bdb-3cba7d56fae3 ├─nvme0m1p4 ext4 MANJARO 3134ceb0-795e-4f51-a6fb-ba172fac0312 75,5G 16% / └─nvme0m1p5 zfs_member tank 9593340550191022900
lsblk -a
nvme0m1 259:0 0 953,9G 0 disk ├─nvme0m1p1 259:1 0 780M 0 part /boot/efi ├─nvme0m1p2 259:2 0 5G 0 part ├─nvme0m1p3 259:3 0 97,7G 0 part ├─nvme0m1p4 259:4 0 97,7G 0 part / └─nvme0n1p5 259:5 0 752,8G 0 part
blkid /dev/nvme0m1
/dev/nvme0m1: LABEL="tank" UUID="9593340550191022900" UUID_SUB="541976190045946664" TYPE="zfs_member" PTUUID="e7762bd0-453e-4900-b428-26f1b11c22b5" PTTYPE="gpt"
Describe how to reproduce the problem
Followed instructions on https://wiki.archlinux.org/index.php/ZFS zpool was created with id from ls -lh /dev/disk/by-id/
sudo zpool create -f -o ashift=13 -m /mnt/tank tank nvmePC401_NVMe_SK_hynix_1TB_MI93T003810403E62-part5
NOTE that zpool was mounted as default to occupy the whole partition (i.e. w/o redundancy or raid0)
Interestingly gparted (thus disk signature) showed correctly the partition table after installation. Everything got messed up after enabling zfs.target zfs-mount zfs-import.target zfs-import-cache and reboot.
Include any warning/errors/backtraces from the system logs
This is a critical issue. boot log now messed up
juil. 31 07:07:37 XPS13 systemd[1]: systemd-firstboot.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start First Boot Wizard. juil. 31 07:07:37 XPS13 systemd[1]: systemd-sysusers.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Create System Users. juil. 31 07:07:37 XPS13 systemd[1]: systemd-fsck-root.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start File System Check on Root Device. juil. 31 07:07:37 XPS13 systemd[1]: systemd-binfmt.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Set Up Additional Binary Formats. juil. 31 07:07:37 XPS13 systemd[1]: systemd-guest-user.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: systemd-guest-user.service: Failed with result 'start-limit-hit'. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start systemd-guest-user.service. juil. 31 07:07:37 XPS13 systemd[1]: systemd-hwdb-update.service: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Rebuild Hardware Database. juil. 31 07:07:37 XPS13 systemd[1]: sys-fs-fuse-connections.mount: Start request repeated too quickly. juil. 31 07:07:37 XPS13 systemd[1]: Failed to mount FUSE Control File System. juil. 31 07:07:37 XPS13 systemd-udevd[300]: Process '/usr/bin/alsactl restore 0' failed with exit code 99.