zfsonlinux / pkg-zfs

Native ZFS packaging for Debian and Ubuntu
https://launchpad.net/~zfs-native/+archive/daily
308 stars 55 forks source link

debian 7.5 wheezy with zfs 0.6.3 fails to import zpool when zpool is already imported because a cachefile is present. #119

Closed aarcane closed 8 years ago

aarcane commented 10 years ago

When a zpool.cache file is present because of zfsonlinux/zfs/#2474 , zpool import fails in the zpool import stage, stating that the pool is already imported.

Adding the following lines to the initramfs zfs script at line 246 allows the boot process to continue, but I do not think the logic is completely correct. Somebody who is more familiar with bash scripting should incorporate the changes below in a proper way:

             # Attempt 0: See if we're already mounted
             if [ "$ZFS_ERROR" != 0 -a -n "$ZPOOL_CACHE" ]
             then
                 ZFS_CMD="zpool status -x ${ZFS_RPOOL}"

                 ZFS_STDERR=$($ZFS_CMD 2>&1)
                 ZFS_ERROR=$?

                 [ "$ZFS_ERROR" != 0 ] && echo "FAIL: $ZFS_CMD.  Retrying..."
             fi
FransUrbo commented 10 years ago

I have some time over (waiting for other things to finish) so I figure I take a look at this...

What exactly did you want to accomplish with that part? Making sure that the pool isn't already imported?

aarcane commented 10 years ago

The purpose of this block is to test if the pool has been imported already. If the pool is imported, continue without attempting to import it.

FransUrbo commented 10 years ago

Ok, but unfortunately it isn't as simple as the code you posted...

But fixing the cause seems to more sensible than going around it. The zfs_autoimport_disable should work.

So I'm probably not going to do anything about this issue....

aarcane commented 10 years ago

You're probably right. 5 minutes of shell scripting is too much to bother with when what's needed to fix a debilitating problem with unattended boot is really to just wait until 0.6.4 is released.

FransUrbo commented 10 years ago

If it had only been five minutes, I'd do it. But your code snippet won't work (properly). It's a hack.

Also, one of my "life-promises" is NOT to fix bugs in other software by going around them. Once you go down that route, you'll have lots of code that usually don't do anything, and will be impossible to maintain in the long run. Not to mention the one taking over my work the day I resign.... If I'm going to be hated for crap, it should at least be of my own making :)

aarcane commented 10 years ago

When I get home I'll see if I can figure out how to get it working. zpool status -x really should indicate error in the exit code.

FransUrbo commented 10 years ago

Using zpool status [anything] is the wrong way to check if something is mounted. The option zfs mount seems more reasonable. But that's beside the point. The code is in the wrong place as well. But moving it to where it should be (somewhere right after the modprobe line) require a lot of work.

pcoultha commented 9 years ago

I was able to reproduce and track down this issue. Are you also seeing this error prior to the pool failing to import? modprobe: can't load module zcommon (updates/dkms/zcommon.ko): No such file or directory

The cause of this is surprisingly complicated, but the fix is straightforward and simple.

In /usr/share/initramfs-tools/scripts/zfs change the line modprobe zfs zfs_autoimport_disable=1 to /sbin/modprobe zfs zfs_autoimport_disable=1 and rebuild the initramfs.

Here's what's happening. Debian's initramfs includes two modprobe executables. Calling modprobe from a shell script invokes the BusyBox built-in modprobe. Calling /sbin/modprobe invokes the kmod version of modprobe. There is an interaction between the BusyBox modprobe and the zfs modules that prevents it from completely loading the module stack. It errors out well before even trying to load the zfs module. The kmod version of modprobe does not suffer from this interaction and can successfully load the complete module stack.

This means that currently, at the time zpool import ... is first run, the zfs kernel module is not actually loaded. zpool (by way of libzfs) detects the missing module and invokes /sbin/modprobe zfs which loads it and it's remaining dependencies successfully. Because the zfs_autoimport_disable=1 parameter is not included in this invocation, the pool is imported automatically. zpool continues running and attempts to import the pool again, which fails.

Changing the initramfs script to use /sbin/modprobe ensures that the ZFS module is loaded at the expected time, and "sees" the zfs_import_disable parameter.

Why can't BusyBox modprobe load the ZFS modules? The BusyBox version of modprobe makes the assumption that the current working will not be changed during the process of loading modules. It cd's to /lib/modules/$version/ and then loads each required module using a relative path. Kernel modules using SPL violate this assumption, and change to / on initialization. On my system, zcommon.ko is the first module that fails to load because BusyBox is unexpectedly trying to find it at /updates/dkms/zcommon.ko.

(initramfs) modprobe --help
BusyBox v1.20.2 (Debian 1:1.20.0-7) multi-call binary.
...
(initramfs) /sbin/modprobe -V
kmod version 9

(initramfs) which modprobe
/sbin/init/modprobe   <--- Lie
FransUrbo commented 9 years ago

Calling modprobe from a shell script invokes the BusyBox built-in modprobe. Calling /sbin/modprobe invokes the kmod version of modprobe.

Thanx a million for finding this, I would never have found this. It was just never in the possibilities…

I'm building new packages as we speak...=

FransUrbo commented 9 years ago

Fixed in Debian GNU/Linux Wheezy by snapshot/debian/wheezy/0.6.3-15_e66597_wheezy (pkg version 0.6.3-15~e66597~wheezy).

Just pushed the packages to the archive, so they should be available within the hour (hopefully - sometimes it can take a while).

@dajhorn, is this a problem in Ubuntu as well?

dajhorn commented 9 years ago

@FransUrbo, probably not, because the Ubuntu packages still assume default ZoL 0.6.3 behavior.

I'm updating the Ubuntu packaging for zfsonlinux/zfs#2780 now and intend to deprecate /etc/init.d and update the initramfs parts for the Utopic release. Dunno how this kind of issue will affect things yet.

FransUrbo commented 9 years ago

intend to deprecate /etc/init.d

So you're on the systemd train fully then? No support for those that hate it? and update the initramfs parts for the Utopic release.

Mind having a look at https://github.com/zfsonlinux/zfs/pull/2087?

dajhorn commented 9 years ago

So you're on the systemd train fully then? No support for those that hate it?

The pending change is actually a long-needed enhancement for upstart.

Mind having a look at https://github.com/zfsonlinux/zfs/pull/2087?

Will do, when I revisit the initramfs stuff in a few days.

bkus commented 9 years ago

Can someone push this fix into the jessie branch of ZoL? I'm running Debian-testing with ZoL-jessie and the bug is still there. image image

FransUrbo commented 9 years ago

@bkus Are you running jessie or jessie-daily (the latest GIT master)? This is fixed in jessie-daily (and will be fixed in the 0.6.4 release which shouldn't be to far off now).

FransUrbo commented 8 years ago

I believe this to be fixed in the new 0.6.5 version which was released a week or so ago. It contain new, improved versions of the init and initrd scripts where this doesn't (shouldn't) happen.