zfsonlinux / pkg-zfs

Native ZFS packaging for Debian and Ubuntu
https://launchpad.net/~zfs-native/+archive/daily
308 stars 55 forks source link

Ubuntu Trusty: zfs-linux 0.6.3-3~trusty zpool-import.conf not properly importing zpool with devices mapped by-vdev #130

Closed gbkersey closed 8 years ago

gbkersey commented 9 years ago

So after upgrading to 0.6.3-3~trusty my zpool status looks like this....

  pool: export10
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 3h51m with 0 errors on Sun Nov 16 05:51:44 2014
config:

    NAME        STATE     READ WRITE CKSUM
    export10    ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
        sdf     ONLINE       0     0     0
        sdg     ONLINE       0     0     0
        sdh     ONLINE       0     0     0
    logs
      mirror-1  ONLINE       0     0     0
        sde1    ONLINE       0     0     0
        sdi1    ONLINE       0     0     0
    cache
      sde       UNAVAIL      0     0     0
      sdi       UNAVAIL      0     0     0
    spares
      sdd       AVAIL   

errors: No known data errors

It should look like this (using /dev/disk/by-vdev):

  pool: export10
 state: ONLINE
  scan: scrub repaired 0 in 3h51m with 0 errors on Sun Nov 16 05:51:44 2014
config:

    NAME           STATE     READ WRITE CKSUM
    export10       ONLINE       0     0     0
      raidz1-0     ONLINE       0     0     0
        d1         ONLINE       0     0     0
        d2         ONLINE       0     0     0
        d3         ONLINE       0     0     0
        d4         ONLINE       0     0     0
        d5         ONLINE       0     0     0
    logs
      mirror-1     ONLINE       0     0     0
        lc1-part1  ONLINE       0     0     0
        lc2-part1  ONLINE       0     0     0
    cache
      lc1-part2    ONLINE       0     0     0
      lc2-part2    ONLINE       0     0     0
    spares
      d6           AVAIL   

errors: No known data errors

I can't seem to force the script in zpool-import to work properly even after /dev/disk/by-vdev is there. To fix the problem on my system I have disabled zpool-import.conf and I'm using the /etc/init.d/zfs-mount which is still included in the zfsutils package. However I did have to modify that script to make sure that the by-vdev aliases were available.

--- zfs-mount.orig  2014-11-21 12:21:56.598893100 -0600
+++ zfs-mount   2014-11-21 10:50:04.305175986 -0600
@@ -24,6 +24,8 @@
 {
    log_begin_msg "Mounting ZFS filesystems"
    log_progress_msg "filesystems"
+   udevadm trigger
+   sleep 5
    zfs mount -a
    RET=$?
dajhorn commented 9 years ago

This might resolve by exporting all pools, deleting the /etc/zfs/zpool.cache file, and reimporting.

If not, then please submit the (new, binary) /etc/zfs/zpool.cache file, and the /var/log/udev and /var/log/syslog files as a gist or as an email attachment.

The new /etc/init/zpool-import.conf upstart task does not check for cache devices, and it looks like /dev/sde and /dev/sdi didn't show up in time for import.

gbkersey commented 9 years ago

Acutally, /dev/sde and /dev/sdi are wrong... It should be /dev/sde2 and /dev/sdi2 But cache drives don't show up in zpool.cache, so I guess there is no way of knowing....

I'll get you some logs in a few.

Thanks!

gbkersey commented 9 years ago

I exported all pools and checked that /etc/zfs/zpool.cache was gone. Then I reimported.

I set udev_log="debug" and rebooted....

The zpool came up in the bad state as before. The requested files are attached.

From: "Darik Horn" notifications@github.com To: "zfsonlinux/pkg-zfs" pkg-zfs@noreply.github.com Cc: "Bo Kersey" bo@vircio.com Sent: Friday, November 21, 2014 6:03:03 PM Subject: Re: [pkg-zfs] Ubuntu Trusty: zfs-linux 0.6.3-3~trusty zpool-import.conf not properly importing zpool with devices mapped by-vdev (#130)

This might resolve by exporting all pools, deleting the /etc/zfs/zpool.cache file, and reimporting.

If not, then please submit the (new, binary) /etc/zfs/zpool.cache file, and the /var/log/udev and /var/log/syslog files as a gist or as an email attachment.

The new /etc/init/zpool-import.conf upstart task does not check for cache devices, and it looks like /dev/sde and /dev/sdi didn't show up in time for import.

— Reply to this email directly or view it on GitHub .

dajhorn commented 9 years ago

@gbkersey, Github stripped the attachments. Please resubmit them to http://gist.github.com/ or email them to me directly.

dajhorn commented 9 years ago

@gbkersey, I got the files; thanks. It looks like the zpool-import job needs to trigger the vdev rule per your suggestion.

I will try to recreate your configuration on my test bench, so please send me the vdev configuration file and the /var/log/upstart/zpool-import.log file if it exists.

gbkersey commented 9 years ago

There is no zpool-import.log...

Here is my vdev configuration file:

by-vdev

name device-link

data disks

alias d1 wwn-0x5000cca23dcddb63 alias d2 wwn-0x5000cca23dcd8412 alias d3 wwn-0x5000cca23dcddb5a alias d4 wwn-0x5000cca23dccf62a alias d5 wwn-0x5000cca23dcd8664 alias d6 wwn-0x5000cca23df29669

intent log / cache disks

alias lc1 wwn-0x5001b44a2dd1de54 alias lc2 wwn-0x5001b44a2dd1e266

From: "Darik Horn" notifications@github.com To: "zfsonlinux/pkg-zfs" pkg-zfs@noreply.github.com Cc: "Bo Kersey" bo@vircio.com Sent: Saturday, November 22, 2014 9:32:05 AM Subject: Re: [pkg-zfs] Ubuntu Trusty: zfs-linux 0.6.3-3~trusty zpool-import.conf not properly importing zpool with devices mapped by-vdev (#130)

@gbkersey , I got the files; thanks. It looks like the zpool-import job needs to trigger the vdev rule per your suggestion.

I will try to recreate your configuration on my test bench, so please send me the vdev configuration file and the /var/log/upstart/zpool-import.log file if it exists.

— Reply to this email directly or view it on GitHub .

Bo Kersey VirCIO - managed network solutions 4314 Avenue C Austin, TX 78751 phone: (512)374-0500

dajhorn commented 9 years ago

That /etc/zfs/vdev_id.conf looks normal, and I have something similar on my computer, but the /var/log/upstart/zpool-import.log file should be non-empty if zpool import returned an error message.

In this case, the /dev/disk/by-vdev links should be created during pre-init before the regular system starts. Needing to run udevadm trigger that late indicates a secondary problem.

The next troubleshooting steps are:

  1. Run update-initramfs -c -k all to ensure that the vdev helpers are actually in the initrd.
  2. Reboot into a recovery prompt:
    1. Hold the left key during POST to get the GRUB menu.
    2. Choose "Advanced", which is usually the second line.
    3. Choose "Recovery Mode" for the latest kernel version
    4. Choose "drop to root shell prompt".
  3. Check that /dev/disk/by-vdev is completely populated in the rescue environment.
  4. Run zpool import -N export10 at the rescue prompt.

This should quietly succeed without warnings, even with a read-only root filesystem at a recovery prompt. I'm hoping that it will complain about the partitioning on the cache devices or mismatched device names.

If it doesn't, then the next step will be breaking the zpool-import.conf job into a shell and poking around in the same way, which will be somewhat more difficult.

gbkersey commented 9 years ago

Rebooted to recovery prompt. I don't have /dev/disk/by-vdev I only have by-id, by-label, by-partlabel, by-partuuid, by-path, by-uuid.

udevadm trigger did not make by-vdev show up. loaded zfs kernel module and tried udevadm trigger again... by-vdev still not there.

dajhorn commented 9 years ago

@gbkersey, I now have a reproducer for this bug and will work on fixing it. Disabling the zpool-import upstart task is the best solution for this system configuration until the package is updated.

// Thanks for reporting the problem and creating an issue ticket for it.

dajhorn commented 9 years ago

But we should verify that we have the same bug. Try this:

  1. Verify that BUSYBOX=Y in the /etc/initramfs-tools/initramfs.conf file. (This is the system default.)
  2. Run update-initramfs -c -k all.
  3. Run update-grub.
  4. Reboot and append break=bottom to the kernel command line. (Screenshot attached.)
  5. Verify that these files exist at the (initramfs) prompt:
    • /etc/zfs/vdev_id.conf
    • /etc/zfs/zpool.cache
    • /lib/udev/rules.d/69-vdev.rules
    • /lib/udev/vdev_id
    • /dev/disk/by-vdev/*

If they don't exist, then we have a different glitch and will need to further bisect the breakage on your system.

break bottom

vdev files at the initramfs prompt

gbkersey commented 9 years ago

BUSYBOX=Y is there

recreated initramfs and updated grub rebooted into initramfs... everything exists except /dev/disk/by-vdev

From: "Darik Horn" notifications@github.com To: "zfsonlinux/pkg-zfs" pkg-zfs@noreply.github.com Cc: "Bo Kersey" bo@vircio.com Sent: Sunday, November 23, 2014 10:16:57 AM Subject: Re: [pkg-zfs] Ubuntu Trusty: zfs-linux 0.6.3-3~trusty zpool-import.conf not properly importing zpool with devices mapped by-vdev (#130)

But we should verify that we have the same bug. Try this:

  1. Verify that BUSYBOX=Y in the /etc/initramfs-tools/initramfs.conf file. (This is the system default.)
    1. Run update-initramfs -c -k all .
    2. Run update-grub .
  2. Reboot and append break=bottom to the kernel command line. (Screenshot attached.)
    1. Verify that these files exist at the (initramfs) prompt:
      • /etc/zfs/vdev_id.conf
      • /etc/zfs/zpool.cache
      • /lib/udev/rules.d/69-vdev.rules
      • /lib/udev/vdev_id
      • /dev/disk/by-vdev/*

If they don't exist, then we have a different glitch and will need to further bisect the breakage on your system.

— Reply to this email directly or view it on GitHub .

Bo Kersey VirCIO - managed network solutions 4314 Avenue C Austin, TX 78751 phone: (512)374-0500

dajhorn commented 9 years ago

Unfortunately, this is indeed a unique failure. Assuming that it is even running, the vdev_id helper must be instrumented with a bunch of echo lines to determine why it isn't creating the /dev/disk/by-vdev links during pre-init.

I will try to do this when time permits.

gbkersey commented 9 years ago

Actually, I have upgaded the kernel to 3.13.0-40-generic and the the /dev/disk/by-vdev is now there inside initramfs.. Here are captures of my steps...

Setting break=bottom grub

Checking for /etc/zfs and udev stuff udev-vdev-stuff

Showing /dev/disk/by-vdev exists disk-by-vdev

After booting zpool status is not correct

# zpool status
  pool: export10
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 3h42m with 0 errors on Sun Nov 23 05:42:52 2014
config:

    NAME        STATE     READ WRITE CKSUM
    export10    ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
        sdf     ONLINE       0     0     0
        sdg     ONLINE       0     0     0
        sdh     ONLINE       0     0     0
    logs
      mirror-1  ONLINE       0     0     0
        sde1    ONLINE       0     0     0
        sdi1    ONLINE       0     0     0
    cache
      sde       UNAVAIL      0     0     0
      sdi       UNAVAIL      0     0     0
    spares
      sdd       AVAIL   

errors: No known data errors
gdevenyi commented 9 years ago

I have been bitten by the same bug that the /dev/disk/by-vdev/ entries are missing for all my externally attached SAS drives. In this case though, I'm using multipath:

vdev_id.conf

multipath yes

#       PCI_SLOT HBA PORT  CHANNEL NAME
channel 06:00.0  1         A
channel 06:00.0  0         B
channel 81:00.0  1         A
channel 81:00.0  0         B

alias SSD1 /dev/disk/by-id/ata-Samsung_SSD_850_PRO_128GB_XXXXXXXXXXXXXX
alias SSD2 /dev/disk/by-id/ata-Samsung_SSD_850_PRO_128GB_YYYYYYYYYYYYYY
> zpool status
  pool: data
 state: ONLINE
  scan: scrub repaired 0 in 15h6m with 0 errors on Tue Aug 25 01:41:43 2015
config:

    NAME                   STATE     READ WRITE CKSUM
    data                   ONLINE       0     0     0
      raidz2-0             ONLINE       0     0     0
        35000039638d127ec  ONLINE       0     0     0
        35000039638d10a20  ONLINE       0     0     0
        35000039638d133ac  ONLINE       0     0     0
        35000039638d0354c  ONLINE       0     0     0
        35000039638cbf7a4  ONLINE       0     0     0
        35000039638cb2dfc  ONLINE       0     0     0
        35000039638cb2dd8  ONLINE       0     0     0
        35000039638d11200  ONLINE       0     0     0
        35000039638d111dc  ONLINE       0     0     0
      raidz2-1             ONLINE       0     0     0
        35000039638d11f18  ONLINE       0     0     0
        35000039638d128c0  ONLINE       0     0     0
        35000039638d12900  ONLINE       0     0     0
        35000039638d128f0  ONLINE       0     0     0
        35000039638d12908  ONLINE       0     0     0
        35000039638d127d0  ONLINE       0     0     0
        35000039638d11e88  ONLINE       0     0     0
        35000039638cbf400  ONLINE       0     0     0
        35000039638d12630  ONLINE       0     0     0
      raidz2-2             ONLINE       0     0     0
        35000039638d127a4  ONLINE       0     0     0
        35000039638d127a0  ONLINE       0     0     0
        35000039638d11e48  ONLINE       0     0     0
        35000039638d127ac  ONLINE       0     0     0
        35000039638d127a8  ONLINE       0     0     0
        35000039638d12818  ONLINE       0     0     0
        35000039638d12844  ONLINE       0     0     0
        35000039638d12868  ONLINE       0     0     0
        35000039638d12814  ONLINE       0     0     0
      raidz2-3             ONLINE       0     0     0
        35000039638cb9ce8  ONLINE       0     0     0
        35000039638d11cf8  ONLINE       0     0     0
        35000039638d11be4  ONLINE       0     0     0
        35000039638d11ce8  ONLINE       0     0     0
        35000039638d11ee8  ONLINE       0     0     0
        35000039638d11ba8  ONLINE       0     0     0
        35000039638d11b6c  ONLINE       0     0     0
        35000039638d11c48  ONLINE       0     0     0
        35000039638d11cc0  ONLINE       0     0     0
      raidz2-4             ONLINE       0     0     0
        35000039638d11bb0  ONLINE       0     0     0
        35000039638d13d40  ONLINE       0     0     0
        35000039638d132a8  ONLINE       0     0     0
        35000039638d13560  ONLINE       0     0     0
        35000039638d132a0  ONLINE       0     0     0
        35000039638d136dc  ONLINE       0     0     0
        35000039638d132b8  ONLINE       0     0     0
        35000039638d11c9c  ONLINE       0     0     0
        35000039638d13c94  ONLINE       0     0     0
      raidz2-5             ONLINE       0     0     0
        35000039638d11b68  ONLINE       0     0     0
        35000039638d11c10  ONLINE       0     0     0
        35000039638d11cb8  ONLINE       0     0     0
        35000039638d11c24  ONLINE       0     0     0
        35000039638d12884  ONLINE       0     0     0
        35000039638d12854  ONLINE       0     0     0
        35000039638d11ca8  ONLINE       0     0     0
        35000039638cb6de8  ONLINE       0     0     0
        35000039638d046b4  ONLINE       0     0     0
    logs
      mirror-6             ONLINE       0     0     0
        SSD1-part1         ONLINE       0     0     0
        SSD2-part1         ONLINE       0     0     0
    cache
      SSD1-part2           ONLINE       0     0     0
      SSD2-part2           ONLINE       0     0     0
    spares
      35000039638d11f70    AVAIL   
      35000039638d125d0    AVAIL   
      35000039638d11eec    AVAIL   
      35000039638d11f0c    AVAIL   
      35000039638d11c80    AVAIL   
      35000039638d11c38    AVAIL   

errors: No known data errors

When I follow the instructions as per @dajhorn all the settings are present, just the /dev/disk/by-vdev are missing.

gdevenyi commented 9 years ago

After a complete boot /dev/disk/by-vdev is now populated:

lrwxrwxrwx 1 root root 11 Aug 27 15:00 B0 -> ../../dm-21
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B1 -> ../../dm-25
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B10 -> ../../dm-51
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B11 -> ../../dm-19
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B12 -> ../../dm-2
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B13 -> ../../dm-24
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B14 -> ../../dm-14
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B15 -> ../../dm-32
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B16 -> ../../dm-35
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B17 -> ../../dm-4
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B18 -> ../../dm-6
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B19 -> ../../dm-50
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B2 -> ../../dm-29
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B20 -> ../../dm-57
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B21 -> ../../dm-44
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B22 -> ../../dm-17
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B23 -> ../../dm-18
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B24 -> ../../dm-10
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B25 -> ../../dm-23
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B26 -> ../../dm-28
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B27 -> ../../dm-16
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B28 -> ../../dm-11
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B29 -> ../../dm-39
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B3 -> ../../dm-33
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B30 -> ../../dm-45
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B31 -> ../../dm-49
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B32 -> ../../dm-56
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B33 -> ../../dm-42
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B34 -> ../../dm-48
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B35 -> ../../dm-12
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B36 -> ../../dm-0
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B37 -> ../../dm-22
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B38 -> ../../dm-27
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B39 -> ../../dm-31
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B4 -> ../../dm-36
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B40 -> ../../dm-34
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B41 -> ../../dm-38
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B42 -> ../../dm-43
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B43 -> ../../dm-1
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B44 -> ../../dm-55
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B45 -> ../../dm-40
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B46 -> ../../dm-20
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B47 -> ../../dm-54
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B48 -> ../../dm-3
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B49 -> ../../dm-9
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B5 -> ../../dm-5
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B50 -> ../../dm-26
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B51 -> ../../dm-30
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B52 -> ../../dm-15
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B53 -> ../../dm-37
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B54 -> ../../dm-41
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B55 -> ../../dm-8
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B56 -> ../../dm-13
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B57 -> ../../dm-59
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B58 -> ../../dm-47
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B59 -> ../../dm-53
lrwxrwxrwx 1 root root 10 Aug 27 15:00 B6 -> ../../dm-7
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B7 -> ../../dm-52
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B8 -> ../../dm-58
lrwxrwxrwx 1 root root 11 Aug 27 15:00 B9 -> ../../dm-46
lrwxrwxrwx 1 root root 10 Aug 27 15:00 SSD1 -> ../../sddx
lrwxrwxrwx 1 root root 11 Aug 27 15:00 SSD1-part1 -> ../../sddx1
lrwxrwxrwx 1 root root 11 Aug 27 15:00 SSD1-part2 -> ../../sddx2
lrwxrwxrwx 1 root root 10 Aug 27 15:00 SSD2 -> ../../sddw
lrwxrwxrwx 1 root root 11 Aug 27 15:00 SSD2-part1 -> ../../sddw1
lrwxrwxrwx 1 root root 11 Aug 27 15:00 SSD2-part2 -> ../../sddw2
dajhorn commented 8 years ago

The components involved in this glitch were subsequently replaced by the systemd stack, and this ticket itself is stale, so I will close. Thanks for the report; I hope that ZoL worked out for you.