zfsonlinux / pkg-zfs

Native ZFS packaging for Debian and Ubuntu
https://launchpad.net/~zfs-native/+archive/daily
308 stars 55 forks source link

Current PPA/Daily version (6.2.2) hangs while updating a kernel (grub-probe part in post) #106

Closed Phoenixxl closed 9 years ago

Phoenixxl commented 10 years ago

Hello,

Today when upgrading Ubuntu 12.04.4 from 3.2.0-58-generic to 3.2.0-59-generic

The grub configuration part of the install failed.

ZFS used when upgrading was current PPA/Daily (6.2.2) The error in syslog apt tripped over:

Feb 28 13:20:42 Muur1 kernel: [1383786.064540] grub-probe[9149]: segfault at 120 ip 00007fd345ea392f sp 00007fff29699388 error 4 in libzfs.so.1.0.1[7fd345e9a000+3d000]
Feb 28 13:21:11 Muur1 kernel: [1383814.963954] grub-probe[15484]: segfault at 120 ip 00007fcb3a45692f sp 00007ffffd917908 error 4 in libzfs.so.1.0.1[7fcb3a44d000+3d000]
Feb 28 13:21:40 Muur1 kernel: [1383844.080332] grub-probe[21806]: segfault at 120 ip 00007f565435692f sp 00007fffd0481c78 error 4 in libzfs.so.1.0.1[7f565434d000+3d000]
Feb 28 13:22:08 Muur1 kernel: [1383872.570545] grub-probe[28121]: segfault at 120 ip 00007f95d647192f sp 00007fffe9fc7928 error 4 in libzfs.so.1.0.1[7f95d6468000+3d000]

To fix I did: downgraded zfs to current PPA/Stable (6.2.1)

Snippet from where it goes wrong (apt-get output grab):

run-parts: executing /etc/kernel/postinst.d/zz-update-grub 3.2.0-59-generic /boot/vmlinuz-3.2.0-59-generic
Segmentation fault (core dumped)
run-parts: /etc/kernel/postinst.d/zz-update-grub exited with return code 139
Failed to process /etc/kernel/postinst.d at /var/lib/dpkg/info/linux-image-3.2.0-59-generic.postinst line 1010.
dpkg: error processing linux-image-3.2.0-59-generic (--configure):
 subprocess installed post-installation script returned error exit status 2
dpkg: dependency problems prevent configuration of linux-image-generic:
 linux-image-generic depends on linux-image-3.2.0-59-generic; however:
  Package linux-image-3.2.0-59-generic is not configured yet.
dpkg: error processing linux-image-generic (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of linux-image:
 linux-image depends on linux-image-generic (= 3.2.0.59.70); however:
  Package linux-image-generic is not configured yet.
dpkg: error processing linux-image (--configure):
 dependency problems - leaving unconfigured
Setting up libuutil1 (0.6.2-1~precise) ...
No apport report written because the error message indicates its a followup error from a previous failure.
No apport report written because MaxReports is reached already

Alas there was nothing more in syslog to show.

The machine in question boots natively into ZFS, A discussion about this was had here in 2012. https://github.com/zfsonlinux/pkg-zfs/issues/51

I can only assume the zfs modified version of grub is unable to read the disk header through libzfs in this latest version? But frankly I have no idea.

For now this is fixed by using stable again , but moving 6.2.2 from daily to stable in it's current form will break zfs for me.

Friendly regards.

jeff-dagenais commented 10 years ago

I am experiencing the exact same problem: zfs-dkms: Installed: 0.6.2-2~precise~2.gbp8db412. Ubuntu 12.04.

FYI, I am working around by dropping grub-zfs back to the default grub. I diverged from the ubuntu native rootfs instructions in that I use a usb stick mounted at /boot for all grub boot stuff, and hardcoded the rootfs in /etc/default/grub with my pool and rootfs dataset. I also had to hack grub-probe as mentioned here https://github.com/zfsonlinux/grub/issues/6 and here http://wiki.complete.org/ConvertingToZFS#Grub_mirrored_rpool_workaround

FransUrbo commented 10 years ago

Might be related to https://github.com/zfsonlinux/zfs/issues/2145.

If it is indeed this, then double checking the grub.cfg file and a reboot should fix it.

Phoenixxl commented 10 years ago

@FransUrbo All my pools are available when using either daily or stable , there is no issue with losing any data either. You link has missing pools as it's primary symptom. The issue here is a grub-probe issue that comes from libzfs. My system even boots and starts just fine. I just have 3 installs stuck in "configuration" where aptitude is concerned since grub is unable to update and finish. grub-probe segfaults when scanning drives (I presume it fails on one of the zfs drives and not my sata /boot partition , else the lib in question would not be libzfs), It's not like it's a grub related boot issue either.

@jeff-dagenais I also use a /boot that's on a different drive. Not a stick but a small SSD , who's second partition is also my l2arc. my rootfs definition is also "hardcoded" in grub. The issue is not an issue with not booting or not starting up. The issue is grub-probe segfaulting when finding zfs drives.Also from what I remember it has to be the modified 1.99 grub to boot from raidz1 on 12.04. I had a discussion with @dajhorn about this last year when the question of release upgrading arose if I remember. ( I know the bottom line was : don't release upgrade )

I am not really looking for a temporary workaround.

As said above everything is hunky doory for me while running stable as I am now. I am mentioning all this so it can be fixed before the latest makes it into stable. I may not have mentioned this as clearly as I should have , but this system has been running without any issues for over a year , updating upgrading booting rebooting etc. This is not a fresh install.

Friendly regards

PS : I should also mention my pool is fs 5 pool 28 , feature flag issues are probably not applicable. Let me also say this has nothing to do with upgrading from kernel -58 to -59 i used this fact to sketch a framework of what happened. When updating to -58 a few weeks ago zfs daily probably updated to zol 6.2.2 at that point, everything went fine at that point of course since grub was updated while zol was still 6.2.1, so that's why I didn't have any issues until trying to update to kernel -59. This should probably be called "zol 6.2.2 segfaults when grub-probe is used". But I know for a fact it's still the same grub version and the issue comes from libzfs. I would not want anyone to start fiddling with modified grub needlessly.

jeff-dagenais commented 10 years ago

@Phoenixxl Agreed, the only problem is that grub-probe segfaults. The upgrade was just the trigger for invoking grub-probe.

Since @FransUrbo pointed to https://github.com/zfsonlinux/zfs/issues/2145 and that issue mentions an ABI change for which a reboot might fix the issue, I went ahead and rebooted. Then restored my grub-probe to the zfs-grub/daily (so 1.99-21ubuntu3.9+zfs1~precise1) and the segfault in libzfs.1.0.1 is still present.

The trace I get when I run grub-probe in gdb:

$ sudo gdb --args /usr/sbin/grub-probe -vvv --target=device /
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /usr/sbin/grub-probe...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/sbin/grub-probe -vvv --target=device /
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
grub-core/disk/raid.c:740: Scanning for dmraid_nv RAID devices on disk hd0
/usr/sbin/grub-probe: info: Scanning for dmraid_nv RAID devices on disk hd0.
grub-core/kern/disk.c:245: Opening `hd0'...
/usr/sbin/grub-probe: info: the size of hd0 is 1953525168.
grub-core/kern/emu/hostdisk.c:722: opening the device `/dev/sda' in open_device()
grub-core/kern/disk.c:338: Closing `hd0'.
[...]
grub-core/partmap/msdos.c:166: partition 3: flag 0x0, type 0x0, start 0x0, len 0x0
grub-core/partmap/apple.c:123: bad magic (found 0xeb63; wanted 0x4552
grub-core/kern/disk.c:338: Closing `hd4'.
/usr/sbin/grub-probe: info: scanning hd4,msdos1 for LVM.
grub-core/kern/disk.c:245: Opening `hd4,msdos1'...
/usr/sbin/grub-probe: info: the size of hd4 is 977664.
grub-core/partmap/msdos.c:166: partition 0: flag 0x0, type 0xbe, start 0x800, len 0xee300
/usr/sbin/grub-probe: info: no LVM signature found.
grub-core/kern/disk.c:338: Closing `hd4'.

Program received signal SIGSEGV, Segmentation fault.
zpool_get_config (zhp=0x0, oldconfig=0x0) at ../../lib/libzfs/libzfs_config.c:222
222 ../../lib/libzfs/libzfs_config.c: No such file or directory.
(gdb) bt
#0  zpool_get_config (zhp=0x0, oldconfig=0x0) at ../../lib/libzfs/libzfs_config.c:222
#1  0x00000000004339b9 in ?? ()
#2  0x0000000000402cc6 in ?? ()
#3  0x0000000000402a2f in ?? ()
#4  0x00007ffff71b576d in __libc_start_main (main=0x402800, argc=4, ubp_av=0x7fffffffe688, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe678)
    at libc-start.c:226
#5  0x0000000000402b2d in ?? ()
#6  0x00007fffffffe678 in ?? ()
#7  0x000000000000001c in ?? ()
#8  0x0000000000000004 in ?? ()
#9  0x00007fffffffe8c3 in ?? ()
#10 0x00007fffffffe8d8 in ?? ()
#11 0x00007fffffffe8dd in ?? ()
#12 0x00007fffffffe8ed in ?? ()
#13 0x0000000000000000 in ?? ()
(gdb) list
217 in ../../lib/libzfs/libzfs_config.c
(gdb) directory /usr/src/zfs-0.6.2/lib/libzfs
Source directories searched: /usr/src/zfs-0.6.2/lib/libzfs:$cdir:$cwd
(gdb) list
warning: Source file is more recent than executable.
217 nvlist_t *
218 zpool_get_config(zpool_handle_t *zhp, nvlist_t **oldconfig)
219 {
220     if (oldconfig)
221         *oldconfig = zhp->zpool_old_config;
222     return (zhp->zpool_config);
223 }
224 
225 /*
226  * Retrieves a list of enabled features and their refcounts and caches it in
(gdb) 

so the problem is zpool_get_config (zhp=0x0, oldconfig=0x0) at ../../lib/libzfs/libzfs_config.c:222 which results in the segfault when return (zhp->zpool_config);

Unfortunately the bt doesn't show the proper call stack, because I don't have the debug symbol for grub-probe itself.

Where are these symbols? I could not find a -dbg package for the grub stuff.

mja commented 10 years ago

downgraded zfs to current PPA/Stable (6.2.1)

What do I need to do to downgrade to 6.2.1?

I've added the stable PPA

sudo add-apt-repository ppa:zfs-native/stable

But I thought the downgrade command would be something like

 sudo apt-get install ubuntu-zfs=0.6.2-1~precise

which doesn't do anything.

But I guess I to do it for each component zfs-dkms, lib-zfs, lib-zfs2

Thanks.

dajhorn commented 10 years ago

@mja, right, all components must be downgraded. The easiest way is to use the ppa-purge utility to remove everything from ppa:zfs-native/daily and then reinstall from ppa:zfs-native/stable.

GRUB must be recompiled for the ZoL 0.6.3 beta, which includes local builds from the upstream head and all binary packages currently in the PPA daily section.

This glitch happened (in part) because libzfs got its major version number incremented but the other ZoL libraries didn't, and they are not properly independent build products (at this point in time). One way to fix this locally is to bump the soname on all of the ZoL libraries, but this isn't an appropriate thing for downstream to do in binary packages unless upstream wants it.

Unfortunately, I don't know how to publish an updated GRUB package through Launchpad without breaking the upgrade path for people that just want stable releases or otherwise risking accidental installation on systems that don't need it.

Phoenixxl commented 10 years ago

Quote: @dajhorn

Unfortunately, I don't know how to publish an updated GRUB package through Launchpad without breaking the upgrade path for people that just want stable releases or otherwise risking accidental installation on systems that don't need it.

Maybe switch to a daily/stable model in sync with zfs daily/stable for ppa:zfs-native/grub as well? Or do you in fact mean that whether I switched to stable or not will mean nothing in the end and my system will get broken either way? Surely not? Am I getting the meaning of your statement wrong? I am mostly copncerned with what I can expect to happen , and what I should do to keep a working system if it's beyond switching to stable.

I presume "not properly independent build products" means the zfs/daily will get updates pushed through while they are made to have them tested , whereas zfs/stable gets updated only once "independent build products" are properly attuned to each other?

Friendly regards.

Phoenixxl commented 10 years ago

@mja I didn't have to do anything particularly special. This did it:

add-apt-repository --remove ppa:zfs-native/daily
add-apt-repository ppa:zfs-native/stable
apt-get update
apt-get-upgrade

however "aptitude update" + "aptitude upgrade" did not do anything , I had to use apt-get. When doing the upgrade it tells me zfs "WILL BE DOWNGRADED". I needed to confirm at that point.

dajhorn commented 10 years ago

Maybe switch to a daily/stable model in sync with zfs daily/stable for ppa:zfs-native/grub as well? Or do you in fact mean that whether I switched to stable or not will mean nothing in the end and my system will get broken either way? Surely not?

@Phoenixxl, right, I could do this, but then the issue becomes communicating the change to end-users. Based on prior experience, I'm not going to get good enough penetration before this bug is mooted to make another build series worthwhile.

Anybody that cares enough to implement, advertise, and support this work is certainly welcome to contribute. GRUB can be frustrating and very few people are willing to work on it, so please chip in.

The GRUB package that is in the ppa:zfs-native/staging area might resolve the problem for you, but you should install it manually without actually adding the staging PPA the sources.list.

Phoenixxl commented 10 years ago

The GRUB package that is in the ppa:zfs-native/staging area might resolve the problem for you, but you should install it manually without actually adding the staging PPA the sources.list.

What happened last week already gave me quite a scare tbh , I would need to make an extra machine that can take over in case anything goes wrong before I do anything close to manually installing a version of grub that "might" work.

Manually installing would mean I end up with a system that needs constant manual intervention when doing updates. Say I use grub from /staging and something happens again when over time zfs updates to 6.3? I have a good chance of ending up with a system that doesn't boot then.

Most importantly I would still want to know if I will end up with a borked system if I keep using zfs-native/Stable together with zfs-native/grub ?? Will stable update without waiting for native/grub to get updated so they work together? This is my main worry for now. In case failure is inevitable I would stop updating my system until I put together a new machine or until the green light is given that zfs-native/grub and zfs-native/table are working fine again with each other.

I have been using this system for over a year now, I installed it according to the instructions on the zol site. I am quite happy with the fact it's a native booting zfs system and it has kept up for so long with updating without anything breaking. It would be quite sad for me to see it break now.

@Phoenixxl, right, I could do this, but then the issue becomes communicating the change to end-users. Based on prior experience, I'm not going to get good enough penetration before this bug is mooted to make another build series worthwhile.

The fact that grub is in a separate ppa and is not tailored specifically to the builds it is used by will make the probability of this kind of thing happening again quite real.Not only that , but as this event shows , fixing it is quite tricky as well. There is more than one way to skin a cat. How about including a modified grub in the daily and stable ppa's together with some fancy scripting and ppa pinning in zfs-native/grub zfs-native/daily zfs-native/stable to make sure the right one gets installed. (higher pin for grub inside daily/stable compared to native/grub , but only make it install from daily/stable if "http://ppa.launchpad.net/zfs-native/grub/ubuntu" is active in /etc/apt. Another option would be to keep zfs-native/grub as it is now but add a zfs-native/grub-daily and a zfs-native/grub-stable specifically tailored for those 2 versions of zfs. People could switch in their own time without the need for forced penetration :grinning: . Another option that leans closer to the first is to add a zfs-native/daily+grub and a zfs-native/stable+grub Which would mean maintaining 2 extra ppa's who are mostly duplicates. Zfs is gaining popularity since btrfs is not delivering , a robust native install will need to get some attention eventually.

Also , is it 100% sure that it's an issue that needs a fix on the side of grub and not on the side of libzfs ?

Anybody that cares enough to implement, advertise, and support this work is certainly welcome to contribute. GRUB can be frustrating and very few people are willing to work on it, so please chip in.

The extent of what I can do for now is to give feedback if and when I see something going wrong. I haven't messed with boot sectors since they were located on 880kb amigados formatted floppies.I'm not sure that knowledge is still relevant to GRUB.

Friendly regards.

SeanTasker commented 9 years ago

Hello,

Has there been any progress with this/does this affect any other users? It seems as though I am having the same problem. I can reproduce this by attempting to apt-get autoremove

sudo apt-get autoremove
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED
  linux-image-3.13.0-27-generic linux-image-extra-3.13.0-27-generic
0 to upgrade, 0 to newly install, 2 to remove and 73 not to upgrade.
6 not fully installed or removed.
After this operation, 193 MB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 318372 files and directories currently installed.)
Removing linux-image-extra-3.13.0-27-generic (3.13.0-27.50) ...
Examining /etc/kernel/postrm.d .
run-parts: executing /etc/kernel/postrm.d/initramfs-tools 3.13.0-27-generic /boot/vmlinuz-3.13.0-27-generic
update-initramfs: Deleting /boot/initrd.img-3.13.0-27-generic
run-parts: executing /etc/kernel/postrm.d/zz-update-grub 3.13.0-27-generic /boot/vmlinuz-3.13.0-27-generic
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Generating grub.cfg ...
Segmentation fault (core dumped)
Found linux image: /boot/vmlinuz-3.13.0-34-generic
Segmentation fault (core dumped)
run-parts: /etc/kernel/postrm.d/zz-update-grub exited with return code 139
Failed to process /etc/kernel/postrm.d at /var/lib/dpkg/info/linux-image-extra-3.13.0-27-generic.postrm line 328.
dpkg: error processing package linux-image-extra-3.13.0-27-generic (--remove):
 subprocess installed post-removal script returned error exit status 1
Removing linux-image-3.13.0-27-generic (3.13.0-27.50) ...
Examining /etc/kernel/postrm.d .
run-parts: executing /etc/kernel/postrm.d/initramfs-tools 3.13.0-27-generic /boot/vmlinuz-3.13.0-27-generic
update-initramfs: Deleting /boot/initrd.img-3.13.0-27-generic
run-parts: executing /etc/kernel/postrm.d/zz-update-grub 3.13.0-27-generic /boot/vmlinuz-3.13.0-27-generic
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Generating grub.cfg ...
Segmentation fault (core dumped)
Found linux image: /boot/vmlinuz-3.13.0-34-generic
Segmentation fault (core dumped)
run-parts: /etc/kernel/postrm.d/zz-update-grub exited with return code 139
Failed to process /etc/kernel/postrm.d at /var/lib/dpkg/info/linux-image-3.13.0-27-generic.postrm line 328.
dpkg: error processing package linux-image-3.13.0-27-generic (--remove):
 subprocess installed post-removal script returned error exit status 1
Errors were encountered while processing:
 linux-image-extra-3.13.0-27-generic
 linux-image-3.13.0-27-generic
E: Sub-process /usr/bin/dpkg returned an error code (1)

Here is the output from tail /var/logsyslog.

Sep 25 14:07:32 seanix kernel: [13577.335028] grub-probe[9238]: segfault at 120 ip 00007fea5625d25f sp 00007fff58492618 error 4 in libzfs.so.1.0.1[7fea56252000+3f000]
Sep 25 14:07:32 seanix kernel: [13577.482799] grub-probe[9243]: segfault at 120 ip 00007fb7275a125f sp 00007fffb4195e58 error 4 in libzfs.so.1.0.1[7fb727596000+3f000]
Sep 25 14:07:32 seanix kernel: [13577.687242] grub-probe[9274]: segfault at 120 ip 00007f40b725325f sp 00007fffb0b978f8 error 4 in libzfs.so.1.0.1[7f40b7248000+3f000]
Sep 25 14:07:32 seanix kernel: [13577.812301] grub-probe[9276]: segfault at 120 ip 00007ff04e1b625f sp 00007fffa7bfa328 error 4 in libzfs.so.1.0.1[7ff04e1ab000+3f000]
Sep 25 14:07:33 seanix kernel: [13578.263155] grub-mkrelpath[9335]: segfault at 120 ip 00007f4a3235d25f sp 00007ffff4752128 error 4 in libzfs.so.1.0.1[7f4a32352000+3f000]
Sep 25 14:07:33 seanix kernel: [13578.404718] grub-mkrelpath[9360]: segfault at 120 ip 00007f246fa6925f sp 00007fff68311818 error 4 in libzfs.so.1.0.1[7f246fa5e000+3f000]
Sep 25 14:07:33 seanix kernel: [13578.661653] grub-probe[9393]: segfault at 120 ip 00007f26f2a3225f sp 00007fff4f573788 error 4 in libzfs.so.1.0.1[7f26f2a27000+3f000]
Sep 25 14:07:33 seanix kernel: [13578.801003] grub-probe[9398]: segfault at 120 ip 00007fb1b73d625f sp 00007fff62d2aa28 error 4 in libzfs.so.1.0.1[7fb1b73cb000+3f000]
Sep 25 14:07:33 seanix kernel: [13579.012036] grub-probe[9429]: segfault at 120 ip 00007fd79070f25f sp 00007fff6aa0aea8 error 4 in libzfs.so.1.0.1[7fd790704000+3f000]
Sep 25 14:07:33 seanix kernel: [13579.138868] grub-probe[9431]: segfault at 120 ip 00007f5c95ffe25f sp 00007fff5aa0bc18 error 4 in libzfs.so.1.0.1[7f5c95ff3000+3f000]

This doesn't look like new information for the issue, but I am happy to help in whatever way I can.

I am running Ubuntu 14.10 trusty and using I am using zfs-native/stable trusty, but had to point the raring zfs-native/grub package because there aren't any for trust available.

dajhorn commented 9 years ago

I am running Ubuntu 14.10 trusty and using I am using zfs-native/stable trusty, but had to point the raring zfs-native/grub package because there aren't any for trust available.

@SeanTasker, this is a known incompatibility, but these packages are discontinued and unsupported for Trusty and Utopic. This bug is however resolved in the latest packages for Ubuntu 12.04 Precise, which is still supported, so I'm closing this ticket.

The best solution for Trusty and Utopic installations is to reconfigure the system for the GRUB packages that are currently in regular distro.

Phoenixxl commented 9 years ago

To anyone whom it may concern and was in the same boat as me on this issue: So , I postponed updating this server from the moment of my initial posting, after reverting to 6.2.2. Today I decided to give it a last shot before making a new machine with 14.04 on it. It worked out so I didn't have to.

What I did:

Prelim : I've always had a custom grub entry that points to the /boot/vmlinuz etc etc links instead of the actual files. Those links point to the latest installed kernels. Add this before doing anything else, especially updating/upgrading . I think it's discussed in one of the "installing zfs on boot" tutorials or comments. I also had to do this due to grub thinking my root was either encrypted or compressed.

Steps: Update/upgrade as per usual . 4+ updates won't work due to the segfault in your current zfs-grub. including your new kernel and grub. At this point edit /etc/kernel/postinst.d/zz-update-grub and comment out the actual update-grub command near the bottom. update/upgrade again. Now the only thing that won't update is the new grub. uncomment the entry in /etc/kernel/postinst.d/zz-update-grub again. ... reboot and choose the afore mentioned grub entry that points to the /boot/vmlinuz link. Your system will boot using the latest kernel. update/upgrade Grub will install wilthout segfauling.

Do all this at your own risk of course. This did it doesn't mean it will work for you but it will at least give you some pointers.