Closed Ognian closed 11 months ago
same for kairos-standard-opensuse-tumbleweed-amd64-generic-v2.4.0-k3sv1.26.6+k3s1.iso
umm, this could be related to the gfx set by grub, you may need to set it to lower manually as we now set the gfxterm teminal to auto and it would try to get the highest mode available.
Maybe you can check with different gfxmode values?
seem like elementary also hit this at one point, which seems to confirm that this is a gfx issue, setting a really high gfx setting but the framebuffer is not big enough to display that: https://github.com/elementary/installer/issues/542
trying to change gfxmode from auto to 640x480, but it is wired:
set gfxmode=640x480
-> doesn't helpIt indeed changes something but actually where to do the change? or is it needed multiple times?
And actually why does it work from the usb stick and not after installing? I thought that the grub config is identical...
@Itxaka any news on this, any chance to be fixed in 2.4.1?
@Ognian unfortunately no. As this requires a change to grub default values, we needed to push 2.4.1 to fix some issues before getting to work into this as it requires extensive testing to find a good default.
Tested with 2.4.1 same issue! Noticed the following:
Tested with 2.4.1 same issue! Noticed the following:
Wait, so this means you are able to boot by manually setting the gfxmode rigth? But then on reboot it ignores it unless you set it manually?
Seems like we need to look for a safe default for the resolution
Those are just warnings being exposed. It happened before but we were not logging them properly, it should not affect that much, is just nicer to have those fonts bundled :)
I'll describe the process from the beginning:
kairos-standard-opensuse-leap-amd64-generic-v2.4.1-k3sv1.26.6+k3s1.iso
and burning it to an usb stickthe grub.cfg on the USB stick is much shorter than the one written by the installer on the eMMC (= sd card). the grub configuration on the usb stick always works the one on the sd card never.
I tried to modify the one on the sd card by inserting set gfxmode=640x480
at different places, it changes the behavior BUT none of the attempts lead to booting kairos...
Also faced out of memory
(OOM) issues when trying to install on old Acer Aspire1 laptop (4G ram & mmc). Bisected to loopback
command that cases OOM. Suggestions on SO are to copy kernel & initrc from image to disk. Don't have progress as I am still learning grub...
Also faced
out of memory
(OOM) issues when trying to install on old Acer Aspire1 laptop (4G ram & mmc). Bisected toloopback
command that cases OOM. Suggestions on SO are to copy kernel & initrc from image to disk. Don't have progress as I am still learning grub...
Enabling debugging with set debug=all
let me pass through loopback
- different error (not OOM). In debug I noticed that tpm module is used, so I turned off TPM in BIOS and kairos started successfully. Although I am unblocked, it is not clear what was the root cause. If it was indeed the lack of memory and TPM use just crossed a bar, then reducing memory foot print makes sense: use text mode by default, test with large images etc
I'll describe the process from the beginning:
1. I'm downloading `kairos-standard-opensuse-leap-amd64-generic-v2.4.1-k3sv1.26.6+k3s1.iso` and burning it to an usb stick 2. Im inserting the stick and booting from it (latte panda delta 3 -> x86_64 with build in eMMC). Stick is booting and I'm getting the qr code. 3. I'm using the webui (ip:8080) to install on the build in eMMC (/dev/mmcblk1), pasting my cloud_config and checking reboot 4. When it restarts, I remove the usb stick so it tries to boot from the eMMC (sd card). Here the out of memory error of grub appears
the grub.cfg on the USB stick is much shorter than the one written by the installer on the eMMC (= sd card). the grub configuration on the usb stick always works the one on the sd card never.
I tried to modify the one on the sd card by inserting
set gfxmode=640x480
at different places, it changes the behavior BUT none of the attempts lead to booting kairos...
yep, this makes sense. Our grub.cfg for livecd does not have the gfxmode set, so it makes sense that on livecd/usb/live mode you do not hit this, its only once you restart from the installed system, then you hit this issue as we set the set gfxmode=auto
Let me test this somehow. Maybe I can make virtualbox reproduce it by setting the video card to a very low amount of ram or something similar....
Also faced
out of memory
(OOM) issues when trying to install on old Acer Aspire1 laptop (4G ram & mmc). Bisected toloopback
command that cases OOM. Suggestions on SO are to copy kernel & initrc from image to disk. Don't have progress as I am still learning grub...Enabling debugging with
set debug=all
let me pass throughloopback
- different error (not OOM). In debug I noticed that tpm module is used, so I turned off TPM in BIOS and kairos started successfully. Although I am unblocked, it is not clear what was the root cause. If it was indeed the lack of memory and TPM use just crossed a bar, then reducing memory foot print makes sense: use text mode by default, test with large images etc
very weird, 4Gb of ram should be more than enough for everything to load with no issues, after all the kernel and initrd cant be more than 200Mb in any of the flavors....
Wondering if its due to the modules or the gfx stuff in your case as well....
So I disabled TPM from BIOS (Thanks @AndreyNikiforov !) I did a clean install of 2.4.1 from USB. On first boot of the internal eMMC:
pressed a key, booting continuous
The above errors don't look scary to see... After this it looks like it works...
Some comments found going trougth teh grub bugtracker:
Finally I found a comment regarding the screen size and GRUB. Apparently the 4k graphics size eats half the available 200MB RAM from GRUBs allotment. Thus any initrd.img larger than 100MB won't load.
Looks like TPM module is indeed involved! https://github.com/rhboot/grub2/pull/102
So https://github.com/rhboot/grub2/commit/635f85b016839b9aaecdecee69a2ee98edb3e0ab was supposed to allow initrds to be allocated over 4GB. However, initrds are also being verified by the verifiers framework, or rather the tpm "verifier" measures them this way.
This causes the verifiers framework to read the entire file into memory first using standard memory allocation to verify it and then release it again before our allocator gets a chance to load the size and allocate it. This is um bad.
So it makes sense that disabling tpm makes it work as it doesnt try to fully load the initrd into memory for measure.
So it seems to be a mix of several things:
HAve to think about this and check further in upstream grubs to see if this has been fixed somewhere but good catch folks.
Thanks @Ognian for reporting this and @AndreyNikiforov for the hint with the TPM. This would have been a nigthmare to track down otherwise!
our kernel on core images is around 13Mb our initrd on core images is around 92/96Mb
It kind of makes sense that we go over that mentioned 100Mb by setting the gfx mode to auto if it choses a very high resolution....
By moving to compressing the initramfs with zstd it would gain us 4 extra Mb, which is not much, but its good enough to breathe I guess
@Ognian does this happen with a non-k3s build? If it also happens, are you able to build a custom image with the --zstd flag on initrd creation to see if it alleviates the issue?
The patch is as follows, its just 1 line:
diff --git a/Earthfile b/Earthfile
index b22b8c8..61eb545 100644
--- a/Earthfile
+++ b/Earthfile
@@ -441,7 +441,7 @@ base-image:
IF [ -e "/usr/bin/dracut" ]
# Regenerate initrd if necessary
RUN --no-cache kernel=$(ls /lib/modules | head -n1) && depmod -a "${kernel}"
- RUN --no-cache kernel=$(ls /lib/modules | head -n1) && dracut -f "/boot/initrd-${kernel}" "${kernel}" && ln -sf "initrd-${kernel}" /boot/initrd
+ RUN --no-cache kernel=$(ls /lib/modules | head -n1) && dracut --zstd -f "/boot/initrd${kernel}" "${kernel}" && ln -sf "initrd-${kernel}" /boot/initrd
END
END
And then simply run earthly +iso --FLAVOR=opensuse-leap --VARIANT=standard --K3S_VERSION=v1.26.6
to generate an iso under build
umm booting from master in 4k doesnt result in the issue being reproduced, even with tpm. Im wondering if its a tpm implementation issue rather than a grub one. We dont ship the tpm module with grub as a module so not sure if its integrated into grub directly.
I think we need to rework the grub.cfg to not load the gfxterm for now unless its needed as its giving us a lot of headaches.
We dropped gfxterm here: https://github.com/kairos-io/packages/pull/473 . Please give it a try if the problem still occurs feel free to re-open.
I'm running into the same problem using kairos-standard-ubuntu-22-lts-amd64-generic-v2.4.1-k3sv1.27.3+k3s1.iso
. I also built from master, thinking that would pull in the changes from https://github.com/kairos-io/packages/pull/473 (and I think it did because I my grub.cfg is now missing all the gfx stuff), but have the same result. I didn't have success disabling TPM either.
Edit: Disabling TPM and reinstalling gave me the same results as @Ognian (can't find regexp, boots after pressing a key). Anyway I'd definitely like to see this issue resolved (ideally without disabling TPM) so let me know if there's anything I can do to help.
Just as another data point, I'm testing on a Microsoft Surface Pro 7+ and getting this on the latest ubuntu-20.04-v2.4.2 and still am seeing grub OOM. If there's anything I can help test, I'd be happy to!
Up to now it seems that to reproduce this issue one needs:
and we still miss something because @Itxaka tried the above combination and couldn't reproduce. His test was on qemu with virtual monitors though so maybe that's the reason (but grub thought the resolution was 4k)
Up to now it seems that to reproduce this issue one needs:
- gfxmode set to auto
- a 4k monitor (to make the above use a high resolution). Maybe 2k will also trigger it, not sure
- a TPM chip on the machine
- uefi booting
I've looked for a way to disable TPM on the Surface Pro, but I don't think that is an available setting in its boot menu. What's the best way to test setting the gfxmode to a lower resolution in Kairos?
I would try this (warning: not tested):
c
to get to the grub consolevideoinfo
to find supported resolutionsgfxmode
to the desired resolution in the Kairos config, as per the kairos docs: https://kairos.io/docs/reference/configuration/#grub-optionsHopefully that should set the gfxmode on the installed system's grub. You can ofcourse check, after installation by editing the grub menu again and looking for that option.
I know you said to use the live CD but I rebooted a node and tried running videoinfo
in the GRUB prompt, it said the command was not found. I tried different combinations of set gfxmode=
and set gfxpayload=
in the custom one-time GRUB options and none of them prevented the error. It also seemed like none of them changed the video. For what it's worth, here's my config
I know you said to use the live CD but I rebooted a node and tried running
videoinfo
in the GRUB prompt, it said the command was not found. I tried different combinations ofset gfxmode=
andset gfxpayload=
in the custom one-time GRUB options and none of them prevented the error. It also seemed like none of them changed the video. For what it's worth, here's my config
I noticed that videoinfo
wasn't on the Kairos grub menu as well, but I downloaded the Ubuntu Server 22.04 ISO and that seemed to do the trick.
Unfortunately, lowering the resolution didn't work for me either =/
@santhoshdaivajna sent me on Slack that they are seeing the same issue on Intel NUC with 8 cpu/32G mem/>500G disk . We may be able to get access to a NUC to debug.
this reminds me https://bugs.launchpad.net/oem-priority/+bug/1842320/comments/125 - did we tried setting up gfxmode to 640x480 ?
maybe it's just the GRUB version causing issues here? @Ognian is that new to 2.4? we could cross check the GRUB versions to see if that's causing it
we think that the tumbleweed grub efi binary is the responsible of this and have reverted the change to use the leap one on https://github.com/kairos-io/packages/pull/553
@Itxaka Thanks for looking into this! Will this also help the ubuntu flavors, or is this specific to opensuse?
Should be for all, as we use the same grub artifacts for all of them
Yes this was new with
maybe it's just the GRUB version causing issues here? @Ognian is that new to 2.4? we could cross check the GRUB versions to see if that's causing it
yes, this was newly introduced with 2.4. I just tested with 2.4.1 and upgraded to 2.4.2 and the result is with 2.4.2 as it was with 2.4.1 and 2.4.0: with TPM -> out of memory error; without TPM -> boots OK The last version I tested where it worked was v2.2.1 The version I have now is: KAIROS_PRETTY_NAME="kairos-standard-opensuse-leap-15.5 v2.4.2-k3sv1.28.2+k3s1" and sudo grub2-install --version grub2-install (GRUB2) 2.06
Hope this helps. Ognian
Unfortunately, the Surface Pro 7+ doesn't allow TPM disable 😕 Is my next
option switch dracut to hostonly=yes
maybe? @Ognian are you running the
grub2-install inside the the new container there?
Thanks!
On Sat, Dec 2, 2023, 11:26 AM Ognian @.***> wrote:
Yes this was new with
maybe it's just the GRUB version causing issues here? @Ognian https://github.com/Ognian is that new to 2.4? we could cross check the GRUB versions to see if that's causing it
yes, this was newly introduced with 2.4. I just tested with 2.4.1 and upgraded to 2.4.2 and the result is with 2.4.2 as it was with 2.4.1 and 2.4.0: with TPM -> out of memory error; without TPM -> boots OK The last version I tested where it worked was v2.2.1 The version I have now is: KAIROS_PRETTY_NAME="kairos-standard-opensuse-leap-15.5 v2.4.2-k3sv1.28.2+k3s1" and sudo grub2-install --version grub2-install (GRUB2) 2.06
Hope this helps. Ognian
— Reply to this email directly, view it on GitHub https://github.com/kairos-io/kairos/issues/1842#issuecomment-1837208720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFOOWNDV4DB34L3HTYKIH3YHNQDHAVCNFSM6AAAAAA5BXH3WOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGIYDQNZSGA . You are receiving this because you commented.Message ID: @.***>
Unfortunately, the Surface Pro 7+ doesn't allow TPM disable 😕 Is my next option switch dracut to
host only=yes
maybe? @Ognian are you running the grub2-install inside the the new container there? Thanks! … On Sat, Dec 2, 2023, 11:26 AM Ognian @.> wrote: Yes this was new with maybe it's just the GRUB version causing issues here? @Ognian https://github.com/Ognian is that new to 2.4? we could cross check the GRUB versions to see if that's causing it yes, this was newly introduced with 2.4. I just tested with 2.4.1 and upgraded to 2.4.2 and the result is with 2.4.2 as it was with 2.4.1 and 2.4.0: with TPM -> out of memory error; without TPM -> boots OK The last version I tested where it worked was v2.2.1 The version I have now is: KAIROS_PRETTY_NAME="kairos-standard-opensuse-leap-15.5 v2.4.2-k3sv1.28.2+k3s1" and sudo grub2-install --version grub2-install (GRUB2) 2.06 Hope this helps. Ognian — Reply to this email directly, view it on GitHub <#1842 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFOOWNDV4DB34L3HTYKIH3YHNQDHAVCNFSM6AAAAAA5BXH3WOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGIYDQNZSGA . You are receiving this because you commented.Message ID: @.>
Yes
this reminds me https://bugs.launchpad.net/oem-priority/+bug/1842320/comments/125 - did we tried setting up gfxmode to 640x480 ?
@mudler I've tried gfxmode=640x480x32 and gfxpayload=640x480x32, but unfortunately it didn't alleviate the OOM errors. I've also tried building from source with @Itxaka recommendation of zstd, which also wasn't enough apparently; however, on my builds from source + Auroraboot do not seem to change the resolution like when I adjust grub settings via cloud_init like it does with official Kairos images. So, maybe a combination will work if I can get the source builds working 🤔
Just tested @alexander-bauer 's workaround of rmmod tpm
on ubuntu-20.04 and it does indeed allow my system to boot, so seems to be related to TPM for me as well.
@alexander-bauer I found an option that is a bit more robust to remove the tpm module from the grub.cfg
.
Pick your favorite Kairos image (e.g., ubuntu:20.04).
FROM quay.io/kairos/ubuntu:20.04-standard-amd64-generic-v2.4.2-k3sv1.28.2-k3s1
RUN sed -i '/insmod regexp/a rmmod tpm' /etc/cos/grub.cfg
docker build -t tpm2workaround -f Dockerfile .
For example, generate an ISO:
docker run --rm -ti \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/config.yaml:/config.yaml \
-v $(pwd)/build:/tmp/auroraboot \
quay.io/kairos/auroraboot \
--set "container_image=docker://tpm2workaround" \
--set "disable_http_server=true" \
--set "disable_netboot=true" \
--set "state_dir=/tmp/auroraboot" \
--cloud-config /config.yaml
@Itxaka or @mudler might know of an easier way to override this using one of the cloud-init stages, I tried after-install-chroot
and before-install
, but neither of those seemed to work.
Hope that helps until we get a more permanent fix!
Could also try the rc3 that we released yesterday to see if it fixes it, as we reverted the grub.efi to a different one which used to work!
Here I can reproduce it as well with rc3 and VirtualBox (ubuntu image: kairos-ubuntu-22.04-standard-amd64-generic-v2.4.3-rc3-k3sv1.28.2+k3s1.iso)
Here I can reproduce it as well with rc3 and VirtualBox (ubuntu image: kairos-ubuntu-22.04-standard-amd64-generic-v2.4.3-rc3-k3sv1.28.2+k3s1.iso)
seems it was just me - recreating the VM with more RAM did the trick
Could also try the rc3 that we released yesterday to see if it fixes it, as we reverted the grub.efi to a different one which used to work!
I tested with quay.io/kairos/ubuntu:20.04-standard-amd64-generic-v2.4.3-rc3-k3s1.28.2-1
, and that worked for the Surface Pro 7+! Many thanks @Itxaka!
After install from
kairos-standard-opensuse-leap-amd64-generic-v2.4.0-k3sv1.26.6+k3s1.iso
on/dev/mmcblk1
on a x86_64 (latte panda 3 d) I get immediately the following grub error: