zfsonlinux / grub

GRUB enhancements for ZFS on Linux
16 stars 18 forks source link

LMDE booting issue with kernel 3.13.11 #14

Open RJVB opened 10 years ago

RJVB commented 10 years ago

I have bumped into a booting issue with my LMDE system on a ZFS root (with an ext3 separate /boot partition). About 2 weeks ago I built and installed the 3.13.11 kernel from sources obtained off the Ubuntu, using the spl and zfs modules from the ppa for Trusty. That worked fine: the other zfs-related packages were kept at the versions from the stable ZoL Debian repo and had no trouble generating a functional bootloader and initrd.

Yesterday I made backups of that ZFS pool by mirroring it to 2 partitions on an external, which were split off after resilvering completed (I had a reason to make 2 backup pools at once). I did this while mounted off another external, not on ZFS; the target external has separate /boot partitions for each of the backup pools. I then started playing with one of the 2 backup pools (with the source pool and the associated /boot offline), first seeing if one could convert a Debian install to Ubuntu by just changing /etc/apt/sources.list, finally by rsyncing everything but /home from a Kubuntu install onto said pool. In short, initrds and grub.cfg must have been recreated during the process, but in theory and as far as I can tell, nothing has been touched on the internal disk.

The issue: booting into the 3.13 kernel off the internal disk/pool now fails. The bootstrap starts as it should, but fails when mounting the pool, leaving me at the busybox prompt. The output looks a bit off (as far as I've been able to read it during a successful boot), but the issue appears to be that the pool is perceived as faulty. It is not flagged as such though, because I can export it (the only thing I can do then) and reimport it as if no error occurred. Evidently the boot process will then complete successfully. I've tried multiple times, but no trace of error subsists after "booting through". Booting into the previous kernel, a 3.12.6 built from sources patched by the Sabayon distro, works fine, without requiring manual intervention. Reinstalling the bootloader and recreating the 3.13 initrd didn't solve the problem. Finally, a scrub of the pool revealed no errors.

I've taken a few photos of the screen, but am not sure if they'll give extra information concerning the reason why the pool is perceived as faulty ... or why this kind of software-hardware interaction issue started only yesterday. Any ideas? Is it very far-fetched to think of a timeout effect on an internal SATA disk that's already serving the initrd?