Closed Shellcat-Zero closed 1 year ago
After more testing, it seems the boot problem might have had something to do the old root dataset not being mounted when update-grub
was executed. After importing the old pool with a new name, zpool import -R /old -f <big_id_number> oldpool
, then mounting the root dataset, zfs mount oldpool/ROOT/ubuntu
, and running update-grub
, the 1-2 minute erroring has gone away prior to the grub menu. I think previously update-grub
was run without issuing that dataset mount command. The last fix was that I needed to edit the old fstab file in 18.04 to reflect the new pool name, oldpool
, or else it failed in the boot process due to not finding and mounting /var/tmp
and /var/log
.
I re-attempted that fix when, after issuing update-grub
without the old pool imported, I observed completely normal booting with it omitting the old pool from grub. I could have swore that I'd tried booting the HWE kernel while omitting the old pool from grub, but at this point I've tried so many things I can't keep track of what I did. I have no idea why the HWE kernel caused such a bad hang-up, but thankfully switching to generic allowed me to proceed and eventually discover what appears to have been the likely issue (needing that explicit zfs mount
prior to update-grub
).
Closing as solved.
I had some disks fail in a RAID-10 system and decided to take the opportunity to rebuild the pool with the latest LTS, one stripe at a time. The old system was 18.04. I was anticipating also doing some incremental hardware updates (aside from disk replacement) so I opted for the HWE kernel. This caused the system to hang at boot (prior to a grub menu) with messages like:
It might have booted eventually, but I never let it run for more than 5 minutes. Nothing I tried with regards to modifying the bpool features worked, such as a solution I mentioned on a very similar previous issue. I had also tried removing
zpool_checkpoint
andlivelist
since they generate warnings at pool creation, but that resulted in a boot which failed into the grub prompt. After conceding that nothing else could probably be done with bpool features, I tried going with the generic kernel instead of HWE, which has resulted in success. Now, those error messages will appear and continue erroring for 1-2 minutes before eventually giving me a grub menu, whereas it had previously spat out those messages and stopped erroring (or apparently doing anything entirely which would indicate boot progress). Those messages were absent in the previous 18.04 startup. It would be nice to circumvent that 1-2 minute erroring if anyone has suggestions.The system uses an x58 chipset with an Intel processor, and does not support UEFI. I guess the lesson here is definitely DO NOT use HWE with older hardware, but I just wanted to mention it here because I haven't seen any other warning/documentation for this kind of issue, and troubleshooting it took an enormous amount of time.
Unfortunately, dual-boot between both pools does not work. Only the 22.04 system is bootable, but I can still mount the old pool for data-copying purposes before I eventually assemble the RAID-10 back together. The 18.04 system had been upgraded from 16.04, which did not have a bpool and therefore 18.04 still had no bpool. Running
update-grub
while the old pool is mounted within 22.04 results in it being added to the boot menu, but attempting to boot to 18.04 results in this boot failure, which gives me a busybox prompt and then an initramfs prompt before finally yielding a kernel panic:I tried manually mounting the old rpool to no avail, and mounting the new bpool also did not work. I'm inclined to believe this is happening because of 18.04 lacking a bpool, but I have no idea. I would like to believe that this is the only issue, because future upgrades with this process would be ideal, splitting the RAID-10 in two and maintaining two systems temporarily until it can be fully reassembled back into the RAID-10 pool (while still leveraging external backups).
One other (superfluous) thing to note for others, is that the options
GRUB_DEFAULT=saved
,GRUB_SAVEDEFAULT=true
result in a pre-boot error message,GRUB error: sparse files not supported
, which I believe is caused by the fact that the filesystem is write-only for grub in the case of the bpool, but this does not prevent startup from happening.I've done the 22.04 install now on 3 different systems for troubleshooting, and I would also note that the
rpool
cannot be exported prior to reboot, for whatever reason. Any export attempt fails withpool is busy
messages, which then means that on the first startup the pool has to be imported manually to continue booting.