linuxboot / heads

A minimal Linux that runs as a coreboot or LinuxBoot ROM payload to provide a secure, flexible boot environment for laptops, workstations and servers.
https://osresearch.net/
GNU General Public License v2.0
1.41k stars 185 forks source link

librem_14 ROM misfunction, no display or freeze then CPU hard LOCKUP #1712

Closed aluciani closed 2 months ago

aluciani commented 3 months ago

Please identify some basic details to help process the report

A. Provide Hardware Details

  1. What board are you using? (Choose from the list of boards here) librem_14

  2. Does your computer have a dGPU or is it iGPU-only?

    • [X] iGPU-only (Internal GPU, normally Intel GPU)
  3. Who installed Heads on this computer?

    • [X] Self-installed
  4. What PGP key is being used?

    • [X] Nitrokey 3 NFC
  5. Are you using the PGP key to provide HOTP verification?

    • [X] Yes

B. Identify how the board was flashed

  1. Is this problem related to updating heads or flashing it for the first time?

    • [X] Updating heads
  2. If the problem is related to an update, how did you attempt to apply the update?

    • [X] Using the Heads menus
    • [X] External flashing (ch341a_spi)
  3. How was Heads initially flashed?

    • [X] Don't know (purism)
  4. Was the board flashed with a maximized or non-maximized/legacy rom?

    • [X] I don't know (purism)
  5. If Heads was externally flashed, was IFD unlocked?

    • [X] Don't know (purism the first time, make BOARD=librem_14 for the update)

C. Identify the rom related to this bug report

  1. Did you download or build the rom at issue in this bug report?

    • [X] I built it
  2. If you built your rom, which repository:branch did you use?

    • [X] Heads:Master heads-librem_14-v0.2.0-2206-gfb9c558.rom
  3. What version of coreboot did you use in building? { You can find this information from github commit ID or once flashed, by giving the complete version from Sytem Information under Options --> menu} coreboot-purism

  4. In building the rom, where did you get the blobs?

    • [X] Extracted from the online bios using the automated tools provided in Heads

Please describe the problem

Describe the bug I wanted to update heads on my librem_14 purism. So I cloned the git repo, built the ROM and put it on a usb key. I updated the ROM via the GUI (keep settings), rebooted and then got a black screen. I told myself that I'd removed the USB key too quickly, that the zip file was corrupted. So I externally flashed the rom with a ch341a_spi programmer. The result was the same: A black screen. then the pc rebooted, and displayed a screen that was on but not displayed. Then nothing moves, the pc freezes, or displays nothing.

To Reproduce Steps to reproduce the behavior:

  1. get a debian 12 system
  2. update the system sudo apt update && sudo apt upgrade the system
  3. clone the head repo git clone https://github.com/linuxboot/heads
  4. install docker (https://docs.docker.com/engine/install/debian/)
  5. install nix
    [ -d /nix ] || sh <(curl -L https://nixos.org/nix/install) --no-daemon
    . /home/user/.nix-profile/etc/profile.d/nix.sh
  6. build the docker image
    nix build .#dockerImage && docker load < result
  7. jump into the image
    docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env
  8. produce the ROM
    make BOARD=librem_14
  9. copy the rom on an USB key
  10. update the librem_14 from GUI see error

THEN

  1. flash with the ch341a_spi
    sudo flashrom -p ch341a_spi -w /home/user/heads/build/x86/librem_14/heads-librem_14-v0.2.0-2206-gfb9c558.zip
    -c <the name of the chip which i don t remember right now>

    see error

// I also tried the old build method on a debian 11 system

  1. boot debian 11 system
  2. clone the repo
  3. make BOARD=librem_14
  4. flash with flashrom and ch341a_spi

Expected behavior the laptop should print the bootsplash and then the head menu

Screenshots I can put pictures of the blackscreen if needed ...

Here is the zip file produced by the build : heads-librem_14-v0.2.0-2206-gfb9c558.zip

tlaurion commented 3 months ago

@123ahaha https://github.com/linuxboot/heads/blob/master/README.md#building-heads was followed?

aluciani commented 3 months ago

@123ahaha https://github.com/linuxboot/heads/blob/master/README.md#building-heads was followed?

Yes indeed, first time (GUI update and 1st ch341a_spi) were with the nix. Then i tried to boot an old debian 11, and to just make BOARD=librem_14. Didn't change the result, still the black screen, reboot, then black screen with some backlight. I also realize when I turn off the librem_14, there is some flash on screen, like bright white screen, I don't know if it's something that can help.

tlaurion commented 3 months ago

@JonathonHall-Purism diffoscope fails on romstage and most of the rom is different?

tlaurion commented 3 months ago

produce the ROM\nmake BOARD=librem_14

This is not upstream instructions. By doing this, you are using your host buildsystem, not the nix docker built isolated buildsystem.

tlaurion commented 3 months ago

produce the ROM\nmake BOARD=librem_14

This is not upstream instructions. By doing this, you are using your host buildsystem, not the nix docker built isolated buildsystem.

By doing this, you are using your host buildsystem, not the nix docker built isolated buildsystem.

https://github.com/linuxboot/heads/blob/fb9c558ba4ed4d6a581b05d7e47b883e0f79c04a/README.md

Eg: docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env -- make BOARD=nitropad-nv41

Or again: https://github.com/linuxboot/heads/blob/fb9c558ba4ed4d6a581b05d7e47b883e0f79c04a/README.md#pull-docker-hub-image-to-prepare-reproducible-roms-as-circleci-in-one-call

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) tlaurion/heads-dev-env:latest -- make BOARD=x230-hotp-maximized docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) tlaurion/heads-dev-env:latest -- make BOARD=nitropad-nv41

Which would translate to this for your librem_14 build case: docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) tlaurion/heads-dev-env:latest -- make BOARD=librem_14

Since you are building latest commit (which incidently is "latest" docker image. Otherwise doc specifies to check Circleci config to get the docker image version on heads used docker image version to match reproducible build output)


To complete your build with self built nix creared docker image, as you intended to do in for that specific Heads commit (observed at the end of your rom name -gXXXXXXXX.zip:

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env -- make BOARD=librem_14

@123ahaha : makes sense?


Please suggest changes you would like to see in README.md that would clarify what was missing so no others come to the same problems. Thanks!

aluciani commented 3 months ago

produce the ROM\nmake BOARD=librem_14

This is not upstream instructions.

I know I first did the build via nix+docker The only difference I have is :

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env

I tough this put me inside the docker image with the correct build system. then i could do

make BOARD=librem_14

from inside the image, am I wrong ?

user@debian$ docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env
bash-5.2# make BOARD=librem_14
----------------------------------------------------------------------
!!!!!! BUILD SYSTEM INFO !!!!!!
System CPUS: 8
System Available Memory: 13909 GB
System Load Average: 0.34
----------------------------------------------------------------------
Used **CPUS**: 8
Used **LOADAVG**: 12
Used **AVAILABLE_MEM_GB**: 13909 GB
----------------------------------------------------------------------
**MAKE_JOBS**: -j8 --load-average=12 

Variables available for override (use 'make VAR_NAME=value'):
**CPUS** (default: number of processors, e.g., 'make CPUS=4')
**LOADAVG** (default: 1.5 times CPUS, e.g., 'make LOADAVG=54')
**AVAILABLE_MEM_GB** (default: memory available on the system in GB, e.g., 'make AVAILABLE_MEM_GB=4')
**MEM_PER_JOB_GB** (default: 1GB per job, e.g., 'make MEM_PER_JOB_GB=2')
----------------------------------------------------------------------
!!!!!! Build starts !!!!!!

... removed the build log

16777216:/home/user/heads/build/x86/librem_14/heads-librem_14-v0.2.0-2206-gfb9c558.rom
bash-5.2# 

Are you sure the way I did is not the same as

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env -- make BOARD=librem_14

?

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env -- make BOARD=librem_14

@123ahaha : makes sense?

I just tried this way, and flashed via ch341a_spi, still got blackscreen with backlight.

Here is the rom heads-librem_14-v0.2.0-2206-gfb9c558.zip

aluciani commented 3 months ago

update : When i let the librem_14 ON long enough i actually have a message :

[249,0789298] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

The hard LOCKUP also is on CPU 2,3,4,5,6,7,9,10,11

tlaurion commented 3 months ago

from inside the image, am I wrong ?

Correct. If previous commands ran to generate output and output used to construct your docker image you then ran interactively, your docker image should be reproducible and produce reproducible rom. (I didn't ran diffoscope on your latest one, I leave this for after you confirm expected rom outcome works or not before investing more unpaid time into troubleshooting this further, going into a more straightforward troubleshooting path, hope you don't get it wrong)

@123ahaha Out of curiosity, if you flash this rom externally (inside of zip) do you get rid of your issue?

https://output.circle-artifacts.com/output/job/b70503c8-c52f-4ff7-b77d-e166f624bd0d/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2206-gfb9c558.zip

tlaurion commented 3 months ago

update : When i let the librem_14 ON long enough i actually have a message :

[249,0789298] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

The hard LOCKUP also is on CPU 2,3,4,5,6,7,9,10,11

Ho. So that's a kernel issue then. Considering coreboot was recently updated, if latest Circleci rom produces same output, I would recommend flashing a rom prior of last coreboot version bump here?

Don you have version info of heads version you were using before internal upgrading?

tlaurion commented 3 months ago

update : When i let the librem_14 ON long enough i actually have a message :

[249,0789298] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

The hard LOCKUP also is on CPU 2,3,4,5,6,7,9,10,11

Ho. So that might be a kernel<->coreboot issue then. Considering coreboot was recently updated, if latest Circleci rom produces same behavior, I would recommend flashing a rom prior of last coreboot version bump here?

Do you have version info of heads version you were using before internal upgrading?

aluciani commented 3 months ago

I tried the

https://output.circle-artifacts.com/output/job/b70503c8-c52f-4ff7-b77d-e166f624bd0d/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2206-gfb9c558.zip rom

, still same issue

I ll try to get a commit before. I don't really remember which commit it was. But I'm sure it was before commit 80284ff246aa8eeea7f0440381c88f585cc76aa9.

tlaurion commented 3 months ago

My hypothesis then is that you're suffering from last coreboot bump https://github.com/linuxboot/heads/pull/1703

tlaurion commented 3 months ago

My hypothesis then is that you're suffering from last coreboot bump https://github.com/linuxboot/heads/pull/1703

So if my hypothesis is right, this should boot https://output.circle-artifacts.com/output/job/14741774-df0a-4b31-a384-512abcef62a8/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2008-gfd98c8d.zip

tlaurion commented 3 months ago

My hypothesis then is that you're suffering from last coreboot bump https://github.com/linuxboot/heads/pull/1703

So if my hypothesis is right, this should boot https://output.circle-artifacts.com/output/job/14741774-df0a-4b31-a384-512abcef62a8/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2008-gfd98c8d.zip

You can off course go and use pureboot releases as well, which is supported way from purism support. Heads is a rolling release which OEM decides to support a Heads upstream commit and rebrand (with purism maintaining their coreboot branch).

I would doubt librem_14 doesn't boot from their releases not sure which heads master commit they use, but should probably be written in their Bill Of Material (BOM) in their release page.

aluciani commented 3 months ago

My hypothesis then is that you're suffering from last coreboot bump #1703

So if my hypothesis is right, this should boot https://output.circle-artifacts.com/output/job/14741774-df0a-4b31-a384-512abcef62a8/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2008-gfd98c8d.zip

booting back to normal with this commit Thanks

Should I let you close the issue ?

tlaurion commented 3 months ago

My hypothesis then is that you're suffering from last coreboot bump #1703

So if my hypothesis is right, this should boot https://output.circle-artifacts.com/output/job/14741774-df0a-4b31-a384-512abcef62a8/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2008-gfd98c8d.zip

booting back to normal with this commit Thanks

@JonathonHall-Purism something wrong after commit fd98c8d for librem 14, most probably coreboot / config stuff, where soft lockup watchdog would probably kick back later after enough wait, but definitely a regression.

Please pin issue, Afk.

JonathonHall-Purism commented 3 months ago

Quick update, I'm able to reproduce this and checking it out, thanks for reporting.

tlaurion commented 3 months ago

@JonathonHall-Purism depending on timeline for fix, I propose we revert #1703 as per #1713 PR.

tlaurion commented 3 months ago

Please pin issue, Afk.

Done to assure visibility

JonathonHall-Purism commented 3 months ago

Agree; commented over there too - I'll test that ROM as soon as it's available from CI. If it boots and I haven't found the actual fix yet, we'll merge it.

aluciani commented 3 months ago

Even if you know more or less where the problem comes from, I'd like to add that the ROM produced for a nitropad-nv41 is working (heads-nitropad-nv41-v0.2.0-2206-gfb9c558.zip)

tlaurion commented 3 months ago

Even if you know more or less where the problem comes from, I'd like to add that the ROM produced for a nitropad-nv41 is working (heads-nitropad-nv41-v0.2.0-2206-gfb9c558.zip)

@123ahaha: not related but thanks for the report (I tested nv41 myself, I cannot test librems). Changeset (reverted changes) can be seen under https://github.com/linuxboot/heads/pull/1713 which is building under CircleCI at https://app.circleci.com/pipelines/github/tlaurion/heads/2636/workflows/f0bfe047-0fa5-40ab-b636-25b63472794d

tlaurion commented 3 months ago

@JonathonHall-Purism @aluciani : https://app.circleci.com/pipelines/github/tlaurion/heads/2636/workflows/f0bfe047-0fa5-40ab-b636-25b63472794d/jobs/48156 finished.

Internal flashing zip downloadable from https://output.circle-artifacts.com/output/job/bd451c3d-9c9e-4f8f-a2c1-ddc402dacca6/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2207-gb20cde8.zip for 30 days starting now.

aluciani commented 3 months ago

Internal flashing zip downloadable from https://output.circle-artifacts.com/output/job/bd451c3d-9c9e-4f8f-a2c1-ddc402dacca6/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2207-gb20cde8.zip for 30 days starting now.

working on my librem 14

JonathonHall-Purism commented 3 months ago

Bisecting the few commits we had downstream on Release 30 has led me here, to the commit switching to Purism bootsplashes:

https://source.puri.sm/firmware/pureboot/-/commit/7f912babf2aca7af73473b1cd41ca586ebdcc3df

The 24.02.01-Purism-1 change works with this commit, but not on any prior commit. Not sure why though, working on it. I would not have expected a bootsplash to cause CPU lockups, I would have thought it'd either show the bootsplash or fail and go on without it.

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it! It doesn't look like any other boards use 24.02.01 yet.

aluciani commented 3 months ago

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it!

I m willing to test it on t430,even on nitropad-nv41..........

It doesn't look like any other boards use 24.02.01 yet.

...... but I cannot make the rom for a t430 with coreboot v24.02.01, I don t really have time to get deep down and to make the hack work

tlaurion commented 2 months ago

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it!

I m willing to test it on t430,even on nitropad-nv41..........

It doesn't look like any other boards use 24.02.01 yet.

...... but I cannot make the rom for a t430 with coreboot v24.02.01, I don t really have time to get deep down and to make the hack work

I could start version bumping to coreboot 24.03.01 for xx30's t430+x230 on a PR to see if it bricks my x230, on which I can test myself as a start, and then extend per family this time (ivy, then sandbridge, then haswell). This will target testing for coreboot version bump, which in the past takes a lot of time to test per board owners so I will make smaller changes this time I guess.

But nv41 depends on Dasharo's fork which is not upstream under coreboot so that will depend on the version base of next Dasharo release.

tlaurion commented 2 months ago

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it!

I m willing to test it on t430,even on nitropad-nv41..........

It doesn't look like any other boards use 24.02.01 yet.

...... but I cannot make the rom for a t430 with coreboot v24.02.01, I don t really have time to get deep down and to make the hack work

I could start version bumping to coreboot 24.03.01 for xx30's t430+x230 on a PR to see if it bricks my x230, on which I can test myself as a start, and then extend per family this time (ivy, then sandbridge, then haswell). This will target testing for coreboot version bump, which in the past takes a lot of time to test per board owners so I will make smaller changes this time I guess.

But nv41 depends on Dasharo's fork which is not upstream under coreboot so that will depend on the version base of next Dasharo release.

@aluciani :

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it!

I m willing to test it on t430,even on nitropad-nv41..........

It doesn't look like any other boards use 24.02.01 yet.

...... but I cannot make the rom for a t430 with coreboot v24.02.01, I don t really have time to get deep down and to make the hack work

The "hack work" is under https://github.com/linuxboot/heads/pull/1715, which is basically changing strings, hashes of downloaded github artifacts, and regenerating oldconfigs as per https://github.com/linuxboot/heads/pull/1715/commits/3a93e441f336231e7195fed083c2944b1707add1 comment. If the roms artifacts don't boot, there is regression on coreboot side between 4.20.01 and 24.02.01.

AFAIK, the patches that were under patches/coreboot-4.22.01/0001-x230-fhd-variant.patch, relative to edp patch for edp/fhd x230 board variant, was merged upstream and unneeded now. So no more coreboot patches should be maintained downstream under Heads, which is where the actual "hack work" was needed before. Let's see.

aluciani commented 2 months ago

The "hack work" is under #1715, which is basically changing strings, hashes of downloaded github artifacts, and regenerating oldconfigs as per 3a93e44 comment. If the roms artifacts don't boot, there is regression on coreboot side between 4.20.01 and 24.02.01.

I ll just wait the CI to build the rom and try it on nitropad-nv41

tlaurion commented 2 months ago

The "hack work" is under #1715, which is basically changing strings, hashes of downloaded github artifacts, and regenerating oldconfigs as per 3a93e44 comment. If the roms artifacts don't boot, there is regression on coreboot side between 4.20.01 and 24.02.01.

I ll just wait the CI to build the rom and try it on nitropad-nv41

Won't change anything for nv41.

tlaurion commented 2 months ago

The "hack work" is under #1715, which is basically changing strings, hashes of downloaded github artifacts, and regenerating oldconfigs as per 3a93e44 comment. If the roms artifacts don't boot, there is regression on coreboot side between 4.20.01 and 24.02.01.

I ll just wait the CI to build the rom and try it on nitropad-nv41

Won't change anything for nv41.

This affects platfroms in title of #1715: xx30, xx20, xx40, xx41 and qemu q35 coreboot test platforms.

JonathonHall-Purism commented 2 months ago

Here's what happened. After the switch to the Wuffs JPEG decoder in 24.02.01, the JPEG decoder now needs a "work area" allocated from the heap roughly proportional to the image size.

The Heads bootsplash is much larger than the PureBoot bootsplashes (1024x768 vs. ~672x112). So the PureBoot bootsplashes were fine, but the Heads bootsplashes exceeded the available heap space.

Then, the coreboot allocator left the heap "full" after failing to fulfill a request due to exceeding the heap size. This caused boot to fail entirely after failing to load the bootsplash.

Upstream fixes:

I'm preparing a branch to bump Librems again with the heap size fix.

(We don't need to cherry-pick the malloc fix, as long as the heap size is increased it won't apply.)

tlaurion commented 2 months ago

@JonathonHall-Purism fixed in master, should close?

Was https://github.com/linuxboot/heads/commit/ebd9fbadb63ae9f43e8497a2d0aebbed169f1767

JonathonHall-Purism commented 2 months ago

Yes thank you, closing