zfsonlinux / pkg-zfs

Native ZFS packaging for Debian and Ubuntu
https://launchpad.net/~zfs-native/+archive/daily
308 stars 55 forks source link

Ubuntu Trusty: zfs-linux 0.6.3-3~trusty breaks mount/import on boot with LUKS enabled #126

Closed 0xFelix closed 8 years ago

0xFelix commented 9 years ago

Hello,

I have two pools with each two disks. All four disks are encrypted with LUKS and ZFS runs on top of the LUKS encrypted /dev/mapper devices.

Until version 0.6.3-3 mounting/importing the pools on boot after the devices have been openend by LUKS was working alright. Since the update the new "/etc/init/zpool-import.conf" script seems to run before LUKS opens the devices which results in the pools not being imported.

Importing the pools manually after boot works fine.

Can this behavior please be changed back to pre 0.6.3-3?

dajhorn commented 9 years ago

@0xFelix, run this command to revert the behavior:

# echo manual >/etc/init/zpool-import.override

Or delete the /etc/init/zpool-import.conf file entirely.

Is this LUKS failure happening with the cryptsetup package doing the device unlocks?

0xFelix commented 9 years ago

Running your command worked... thanks!

Yes, my devices are unlocked via the cryptsetup package. My guess is that cryptsetup unlocks the harddrives after the zpool-import job ran and therefore zpool-import can't find any devices to import or mount and simply removes the zpool cache (?).

dajhorn commented 9 years ago

Running your command worked... thanks!

Welcome.

Let's leave this issue ticket open for a while so that anybody else with the same glitch can put a metoo on it.

Yes, my devices are unlocked via the cryptsetup package.

Okay, good to know that this is a supported configuration.

My guess is that cryptsetup unlocks the harddrives after the zpool-import job ran and therefore zpool-import can't find any devices to import or mount and simply removes the zpool cache (?).

The /etc/zfs/zpool.cache file should not be deleted on Ubuntu systems, even during shutdown or reboot. This might be a case where the disk with the root filesystem should be put into the cryptsetup-early list.

0xFelix commented 9 years ago

I'm not quite sure if the cache gets deleted, it was just a guess. But the zpool-import.conf does something that prevents the pools from being imported automatically after it ran. Disabling the job like you said restores the old behavior, the pools get imported automatically again.

krassle commented 9 years ago

Hello,

I'm having exactly the same problem after upgrade to 0.6.3-3~trusty. There is no encryption involved, just plain raidz-1 of 3 hdd's in one pool. Importing the pool manually works fine.

"/var/log/upstart/zpool-import.log" gives me:

/proc/self/fd/9: line 123: cannot create temp file for here-document: Read-only file system

I'm trying your suggestion to delete the zpool-import.conf and will report back to you.

Thanks.

krassle commented 9 years ago

... removing upstart job works for me too! And we're back to the old "normal" behavior ;) @dajhorn If I may provide some logs/information to help track down this issue, please let me know.

dajhorn commented 9 years ago

/proc/self/fd/9: line 123: cannot create temp file for here-document: Read-only file system

@krassle, this is an important hint, thanks. This could mean that /dev/pts wasn't ready in time for the zpool-import task.

If I may provide some logs/information to help track down this issue, please let me know.

Please post the /etc/fstab, /boot/grub/grub.cfg, and /var/log/syslog files from the affected system to http://gist.github.com/.

If you have time, then I want to check whether the udev cold-plug is populating the /dev tree on-time, and whether the system root is read-only when the zpool-import task is run by upstart.

krassle commented 9 years ago

Please post the /etc/fstab, /boot/grub/grub.cfg, and /var/log/syslog files from the affected system to http://gist.github.com/.

Here you go!

If you have time, then I want to check whether the udev cold-plug is populating the /dev tree on-time, and whether the system root is read-only when the zpool-import task is run by upstart.

Sure, how should I proceed?

orgoj commented 9 years ago

I have same problem as @krassle. echo manual >/etc/init/zpool-import.override fix the boot problem.

dajhorn commented 9 years ago

@krassle, thanks, I intend to look at it this week.

quantenschaum commented 9 years ago

Same problem here.

On Ubuntu Server 14.04 I have a zpool on 4 discs (sda-c, partitionless), the discs are encrypted using cryptsetup/LUKS. The root FS, residing on a 5th disk (sde2), is encrypted, too. The discs are "decrypted" during early boot (from initrd/crypttab). So, the "decrypted" devices should be available before mountall is executed.

/etc/crypttab

root  UUID=*** none luks
data0 UUID=*** root luks,keyscript=/lib/cryptsetup/scripts/decrypt_derived
data1 UUID=*** root luks,keyscript=/lib/cryptsetup/scripts/decrypt_derived
data2 UUID=*** root luks,keyscript=/lib/cryptsetup/scripts/decrypt_derived
data3 UUID=*** root luks,keyscript=/lib/cryptsetup/scripts/decrypt_derived

/etc/fstab

UUID=*** /               btrfs   noatime,subvol=@ 
UUID=*** /boot           ext4    noatime 
UUID=*** none            swap    sw 

I disabled the zpool-import upstart config, and verified that the zfs mountall is installed.

After a reboot the pool is imported, but the filesystems are not mounted automatically. Running zfs mount -a or mountall manually, mounts the FSs.

my guess: at the time when mountall runs, the "decrypted" pool discs are available but not yet fully imported, so mountall won't mount the zfs systems.

zpool-import.conf enabled, should do it's job and import the pool before mountall is run, but it somehow fails.

quantenschaum commented 9 years ago

I added some logging to the beginning zpool-import.conf

echo "waiting pool devices"
for I in $(seq $ZFS_AUTOIMPORT_TIMEOUT)
do
        ls -lh /dev/mapper
        ls -lh /dev/disk/by-id
        sleep 1
done

It shows that the crypto devices do not get mapped within 30 seconds and thus the pool cannot be imported. This breaks the actual intend of zpool-import.conf

The question is, why are the crypto devices not mapped before mountall runs?

ryanjaeb commented 9 years ago

I think I'm having this issue as well. However, I could swear it worked for a little bit running 0.6.3-4~trusty. I wasn't specifically paying attention to it, but it's the first ZFS install I've done and I was rebooting fairly frequently to make sure everything would come back up after boot.

Using echo manual >/etc/init/zpool-import.override gets past the 30 second hang during boot and imports my ZPOOLs, but I have to run zfs mount -a after boot to get my datasets mounted.

There were only a couple changes to the machine the day it stopped working:

Today I noticed an update to 0.6.3-5~trusty which I've already installed. Here is a gist with a few files that I saw requested in a previous message, plus a couple others that are semi-related.

ryanjaeb commented 9 years ago

I've started playing around with a CentOS 7 install hoping things would work a bit better for me, but I see the same behavior as my Ubuntu install; ZFS tries to import pools before the devices from crypttab exist. I'm using version:

zfs.x86_64                            0.6.3-1.2.el7.centos             @zfs

Is there anything I can do to delay the import of ZFS pools until all the crypto devices are mapped?

dajhorn commented 9 years ago

Note that LUKS handling is usually distro specific, so please open new tickets for each permutation of this bug report.

For most DEB platforms, you can get the desired result by configuring ZFS pool members as "early" cryptdisks such that they are unlocked in the initrd before the regular system starts. Please search the usual support forums and/or regular documentation for something that is appropriate for your particular system.

I've been watching the upstream support chatter on this topic, but it isn't reasonable to expect distro to accommodate ZFS integration here until ZoL is imported into Debian and/or Ubuntu as a first class package.

quantenschaum commented 9 years ago

How to mark them as early? The manpage of crypttab only mentions a noearly option. And does this work together with derived keys (decrypt_derived)?

quantenschaum commented 9 years ago

My current workaround is:

Disable zpool-import.conf, so the zfs module gets loaded with autoimport enabled. The I created another upstart task zfs.conf (assuming the zfs tree get mounted on /zfs).

task
start on (local-filesystems and net-device-up)
emits zfs-up
script
  . /etc/environment
  until mountpoint /zfs; do zfs mount -a; sleep 1; done
  initctl emit zfs-up
end script

Now zfs-up is emitted after zfs is mounted. I added this to all services, that depend on zfs.

start on ... and zfs-up
quantenschaum commented 9 years ago

When using derived keys it is not possible to unlock the volumes using a derived key in early stage. On the other hand, the "late" crypt disks are unlocked after zpool-import. So it is impossible for zpool-import to work as expected when using cryptsetup.

Another workaround is to start the cryptdisks explicitly before zpool-import.conf is executed.

Create zpool-unlock.conf containing

task
start on starting zpool-import

script
        for D in data0 data1 data2 data3; do
                cryptdisks_start $D
        done
end script

Adjust the names of the cryptdisk (data0, ...) according to what is in your crypttab.

ryanjaeb commented 9 years ago

@quantenschaum is on the right track with this, but the suggestion for zpool-unlock.conf doesn't work for me. I end up with a device that doesn't have a proper symlink back to (ex:) ../dm-0 in /etc/mapper.

Edit: As noted by @quantenschaum below, the zpool-unlock.conf suggestion will work when using an actual device as a LUKS target, but not when using a file. I've changed my VM setup instructions to reflect this.

It's definitely a problem with the ordering of boot dependencies though. Ubuntu already has an Upstart script that looks like it's supposed to finish mounting late crypto disks that don't get handled by udev. It's /etc/init/cryptdisks. It seems to run very late in the boot process though.

When using late crypto disks in fstab it's not uncommon to see:

The disk drive for /dev/mapper/ext_crypt is not ready yet or not present
Continue to wait, or Press S to skip mounting or M for manual recovery 

Is it possible someone ran into a similar issue and that was their solution? I don't know enough about Upstart and udev to take a shot at finding a solution, but I can offer instructions on creating a broken install using a VM...

VirtualBox VM

Create a virtual machine. I used all of the default settings for an Ubuntu 64 bit VM except:

Ubuntu

I installed ubuntu-14.04.2-server-amd64 using all of the default settings except:

Packages

Install cryptsetup and ubuntu-zfs.

add-apt-repository -y ppa:zfs-native/stable
apt-get update
apt-get -y install ubuntu-zfs cryptsetup

LUKS

Set up a LUKS device. Do NOT use a keyfile on an un-encrypted device for anything but testing.

dd if=/dev/random of=/etc/crypt.key bs=4096 count=1
chmod 0400 /etc/crypt.key
cryptsetup --batch-mode luksFormat /dev/sdb --key-file /etc/crypt.key
echo "zfs_crypt /dev/sdb /etc/crypt.key luks" | tee -a /etc/crypttab
cryptdisks_start zfs_crypt

ZPOOL

Create the tank pool.

zpool create tank /dev/mapper/zfs_crypt

Reboot

Reboot. At this point, the boot will hang for 30s while attempting to open ZFS volumes and no pools will be available when the boot completes.

quantenschaum commented 9 years ago

Hi @ryanjaeb, is /var on a different partition than /? If yes /var is probably not yet mounted, because this will be done by mountall, which runs after zpool-import. zpool-import starts on starting mountall, so it's made to hook in just before mountall. I think it will work in general for ZFS disks on real devices, that appear under /dev.

ryanjaeb commented 9 years ago

@quantenschaum Everything including /var is in the root (/) partition. However, you're correct that it works differently on real devices. I re-tested your suggestion to use zpool-unlock.conf with both my VM and a physical machine. It works on both.

I assumed that root (/) would be up before mountall runs. I tried using /zfs_device.img, but it doesn't make a difference. My best guess is there's a chroot going on somewhere that I don't know about. As I mentioned before, I'm not extremely familiar with the boot process.

Thanks for the workaround!

ryanjaeb commented 9 years ago

@dajhorn zfsonlinux/zfs#2575 is the similar issue for CentOS. The comment by @MagnusMWW was what I needed. When I was testing the other day I tried to get a similar ordering on Ubuntu by making zpool-import.conf run after cryptdisks.conf, but I was never able to get it to work; the system would always freeze.

Edit: To be more clear, I was trying to get cryptdisks.conf to run before zpool-import.conf by adding and started cryptdisks to zpool-import.conf's start on configuration.

quantenschaum commented 9 years ago

@ryanjaeb, maybe / is mounted read only which prevents the file based pool from beeing imported, and mountall then remounts / read-write.

ghost commented 8 years ago

I,m still having the mount/import problem on boot with LUKS enabled ubuntu-zfs 8~trusty zfs-dkms 0.6.5.3-1~trusty

So far i managed to get around it by doing the following

To automatically import the pool sudo nano /etc/init/zpool-import.conf

modprobe zfs zfs_autoimport_disable=1

modprobe zfs zfs_autoimport_disable=0

To automatically mount the zfs mounts sudo nano /etc/rc.local zfs mount -a

Though a more proper fix would be nice.

jefft commented 8 years ago

This problem is reported in Ubuntu bug 1422153, where Steve Langasek says:

The zpool-import job in this package does:

start on ( starting mountall )

And mountall is 'start on startup' - which makes zpool-import the very first job to run on the system, including before udev. If this job does not correctly handle the underlying disks not yet being available, then that's entirely a bug in zpool-import.

Making zpool-import.conf start on started cryptdisks would seem to be the logical fix, but like @ryanjaeb I found it hung. In the end @quantenschaum's zpool-unlock.conf worked for me.

dajhorn commented 8 years ago

Given that upstart is fully deprecated, and that we lack a general fix, this bug is now a wontfix.

Anybody that needs LUKS integration should try the Xenial dailies now and report bugs downstream.

Ubuntu 16.04 is an LTS, which means that any structural incompatibilities that make it into release could have a long lifetime.

dajhorn commented 8 years ago

Given the Ubuntu 16.04 LTS release, I will close this ticket. Any further discussion should go downstream to the new maintenance team.