linuxboot / heads

A minimal Linux that runs as a coreboot or LinuxBoot ROM payload to provide a secure, flexible boot environment for laptops, workstations and servers.
https://osresearch.net/
GNU General Public License v2.0
1.42k stars 186 forks source link

First boot after maximized ROM being flashed doesn't succeed (known: MRC cache not constructed, requires reboot) #1213

Closed githubuseravailable closed 2 years ago

githubuseravailable commented 2 years ago

Board: x230 Heads installer: self installed PGP Key: Yubikey PGP key for TOTP 1st time flash (external flashing) Downloaded Maximized rom (from circle CI) IFD unlocked: don't know

@tlaurion this ticket is a continuation from ticket 1st time heads configuration I have configured the heads, but then there are 3 issues now, that happen in cycle:

After heads' 1st time configuration:

  1. it will start the new kernel, but it takes long time, after 1 hour with no progress, then i restart the laptop, then after restart, i can boot into the OS, then after i restart again from OS, then next issue:
  2. the same issue as ticket unable to read HW clock it shows nonstop gpg: waiting for lock , until then i restart again, then after restart, i can boot into the OS,

these 2 issues goes in cycle: 1st issue, then booting into OS success, then 2nd issue, then 1st issue, then boot OS success, and so on

  1. also the 3rd issue, its TOTP generated number, is never the same, with the number i scanned in my Freeotp app
tlaurion commented 2 years ago

@githubuseravailable

  1. it will start the new kernel, but it takes long time, after 1 hour with no progress, then i restart the laptop

I will need an insight on where to put that better so that it reaches all users. Under https://osresearch.net/x230-maximized-flashing/, as of today:

Two reboots are sometimes needed after flash. Force power off by holding the power button for 10 seconds. Since the memory training data was wiped by the content of the full flashed ROM, this is normal.

Then

the same issue as ticket https://github.com/osresearch/heads/issues/1021 it shows nonstop gpg: waiting for lock , until then i restart again, then after restart, i can boot into the OS,

As stated under #1021:

the hint here is that RTC clock is skew, and if the laptop has issues keeping time, RTC battery (button battery under keyboard) should probably be replaced.

If when booting OS, the OS is syncing clock successfully, but time is lost on reboot, this is clear sign that RTC battery is dead. As said under #1021, I was able only to partly replicate the issue, since that issue happened to me only once on a test laptop that had no battery connected for a really long while, and once it got AC connected, it was able to keep track of time so I did not have the issue anymore.

As to how to access early recovery console, this is by hitting 'r' keyboard's key. Will answer you other question under #1021.


Another hypothesis here is that if you ran the factory reset on a computer that had a clock skew, even if the factory reset script doesn't implement public key expiry date in the future, I'm not sure how gpg handles key that are valid in the future. That may be why you experiment a gpg: waiting for lock error here.

1- You could verify that from Heads recovery shell and checking the output of gpg --card-status (which will require you to do usb-scan, or mount-usb and dismiss any warning, we just want to load usb kernel modules here to be able to communicate with usb controllers to your Yubikey).

2- Also having the output of date would shed direct insight here. But please remind yourself that the time under Heads is in UTC-0/GMT-0 (Greenwhich/UTC Time) for RTC clock.

githubuseravailable commented 2 years ago

@tlaurion

Two reboots are sometimes needed after flash. Force power off by holding the power button for 10 seconds. Since the memory training data was wiped by the content of the full flashed ROM, this is normal.

yes, the 1st issue has disappeared now, thanks.

I will need an insight on where to put that better so that it reaches all users.

maybe good idea to repeat it under configuring keys under the last screenshot, that's the same screenshot, where i wait for the new kernel to start for a long time

the 2nd issue also disappear now, but yes, i think need to replace the RTC battery, because, the date year show long time in the future. i will discuss the 2nd issue in rtc battery

i wonder if in case there is issue, and the heads cannot boot the OS, is it possible to unplug the hard disk, and boot from other laptop ?

then after we fix the RTC clock how about the TOTP, automatically will show the same TOTP with the one in FreeOTP app ?

tlaurion commented 2 years ago

@githubuseravailable

maybe good idea to repeat it under configuring keys under the last screenshot, that's the same screenshot, where i wait for the new kernel to start for a long time

i wonder if in case there is issue, and the heads cannot boot the OS, is it possible to unplug the hard disk, and boot from other laptop ?


Things are getting confusing here and should probably be on a complete different opened issue. The issue you are now talking about actually reflects the name of the original issue : "Starting new kernel takes a long time". Where prior, it was linked to "gpg: waiting for lock" and "First boot after flashing maximized rom doesn't succeed (MRC cache for trained memory doesn't exist" if I understand well. We are combining 3 issues under a single opened issue where none of the exposed problems are linked together.

Would you rename this issue to something clearer? Issues cannot be used as a sliding problem diagnostic/resolution mechanism. Issues need to be for unique problem, so they can be reused and pointed at as duplicate when needed.

So could you please rename this issue then open a new issue with a screenshot of the kexec call that you consider taking too long to happen and expectations as opposed to another use case on the same hardware?

Saying "Starting new kernel takes a long time" is not specific enough, doesn't include the actual final booted OS details, or what "long time" means or expectation justifications. On my current x230 i7, a kexec call (which is actually loading, in case of Qubes, Xen, then initrd in dom0 (drivers and softwares needed, then disk unlock key as additional cpio to be passed to OS) and kernel takes a rough 3-5 seconds for the console to be replaced with final OS kernel output content, which might be creating anxiety but is pretty normal (explanations below). OSes will vary, depending on their used compression algorithms to pack intird and kernel, and what is happening before the actual framebuffer is being initialized and by which component, also depending on memory speed/SSD drive speed etc, with low variation, normally.

Those things are technical. But in the case of x230, where "starting new kernel takes a long time" goes into the technical details of what is happening behind the scenes for the final actual initrd/kernel to initialize the graphical card (x230:i915 drm driver, i195 driver, drm helpers) to actually get a console. On some other platforms, including the new qemu-swtpm board config, that output between kexec call and having qemu showing console (framebuffer or not) is filled in host console which has the output until the emulated graphical card is initialized and the terminal is initialized there, and then the console is showed there. That is taking long, since plymouth is initializing the display way after the kernel is launched, and in some circumstances, requires some hacking around because the actual LUKS unlock prompt is happening on a console that may not be initialized yet...

In short and factually: on x230, there is no coreboot graphic initialization. Heads kernel is actually the one giving framebufffer output, which gives access to graphical GUI. When kexec'ing, the same happens as said before and console is being refreshed only when the i915 driver is actually reinitialized by the kexec'ed kernel.

So the question here is : how long "Starting new kernel takes a long time" is and compared to what? And then, that "long" would depend on the speed of the SSD/HDD drive, memory speed and chosen OS(and if plymouth is used and what hookds it uses). This is not Heads related.

If we look at https://archive.org/details/Heads-Security-Components-Reownership?start=2894 (video at 48:14), we see that from the kexec call to Qubes dom0 console text showing dom0 kernel timestamps, we already have 5 seconds of boot time of kernel already having happenned there (my recording adapter taking that time to refresh screen output and output it), so that "booting" into the kernel (kexec) took actually 5 seconds. On most OSes, plymouth hooks are responsible to actually take control of the console and setup the framebuffer, which in those case you might not have any console output prior of that hook having happened.

tlaurion commented 2 years ago

I took the liberty of renaming the original issue for its actual cause.

tlaurion commented 2 years ago

Second was troubleshooted and confirmed to be linked to a critical time skewed being date in Real Time Clock (RTC) many years in the future and already covered under https://github.com/osresearch/heads-wiki/issues/103 which was bonified with additional content.

tlaurion commented 2 years ago

then after we fix the https://github.com/osresearch/heads-wiki/issues/103 how about the TOTP, automatically will show the same TOTP with the one in FreeOTP app ?

Is a fourth issue. Yes, TOTP should match on both devices if time under Heads is configured to be in UTC/GMT timezone for RTC, meaning Greenwich time. (which network-recovery-init call will fix automatically)

Phones are converting this automatically to generate TOTP codes in application UTC-0/GMT-0 timezone. Phone knows and deals with timezone differences between defined local timezone and knows the time difference against UTC (+- timezone difference).

githubuseravailable commented 2 years ago

We are combining 3 issues under a single opened issue where none of the exposed problems are linked together.

@tlaurion sorry, i thought before that the 3 issues are linked, because it happened in cycle. but then finally we know that the 3 issues are not linked.

thanks for renaming the title, i think now it's better & more specific, before i didn't know that it will only happen in 1st boot.

okay, in summary:

1st issue, 1st boot after external flashing doesn't succeed, solution: after 2-3 rebooting, it will succeed,

2nd issue, after booting shows repeating gpg: waiting for lock solution: unable to read HW clock solution: manual / automatic RTC clock correction

3rd issue, TOTP mismatch solution: TOTP mismatch solution:

Yes, TOTP should match on both devices if time under Heads is configured to be in UTC/GMT timezone for RTC, meaning Greenwich time. (which network-recovery-init call will fix automatically). Phones are converting this automatically to generate TOTP codes in application UTC-0/GMT-0 timezone. Phone knows and deals with timezone differences between defined local timezone and knows the time difference against UTC (+- timezone difference).

thanks for solution, so the ticket can be closed