system76 / firmware-open

System76 Open Firmware
Other
949 stars 86 forks source link

Boot entries in NVRAM are frequently lost #437

Closed marcin-rzeznicki closed 12 months ago

marcin-rzeznicki commented 1 year ago

Boot entries in NVRAM are lost randomly. Setting them manually via efibootmgr works randomly, requiring multiple reboots sometimes

Steps to reproduce

Expected behavior

I always boot into what I want to boot

Actual behavior

I spend more time in Arch live-disk and/or PopOS recovery something than I would like to

Additional info

This started happening after I upgraded into 2023-06-08_36c78ea - nothing of that sort had been happening before. I disabled firmware security because I thought that was the problem (boot process was hanging when it tried to display the screen to enter the digits every time I plugged in any external monitor, worked fine otherwise), but apparently not. The issue happens randomly from my point of view. Sometimes it works for days, sometimes I have to do it four times a day.

crawfxrd commented 1 year ago

Likely either an issue with FaultTolerantWrite or variables being detected as corrupted when written. Probably need to cherry-pick https://github.com/MrChromebox/edk2/commit/d91161196b0e2fb4a11fdadc1c39e2457f617428.

Would require edk2 debug output to check what is causing SMMSTORE to be erased, but it's a fucking nightmare.

crawfxrd commented 1 year ago

Steps to reproduce:

I'm seeing it happen on a dev unit anywhere between 1 and 3 reboots when booting on battery power.

crawfxrd commented 1 year ago

oryp6 (CML-H) and oryp8 (TGL-H) appear to be unaffected.

marcin-rzeznicki commented 1 year ago

Thanks for finally looking into it. Looking forward to a solution, 'cause it's unnerving

crawfxrd commented 1 year ago

Setting up SMI buffer failed:

[DEBUG]  FMAP: area SMMSTORE found @ 1010000 (262144 bytes)
[DEBUG]  smm store: 4 # blocks with size 0x10000
[INFO ]  SMMSTORE: Setting up SMI handler
[ERROR]  SMMSTORE: Failed to install com buffer
crawfxrd commented 1 year ago

The APM SMI is failing with 0x4ed; I don't know what that means. Retrying it seems sufficient to get it to work though.

ilikenwf commented 1 year ago

Interestingly, I think it may be related to setup mode. Once I reran sbctl it showed that secureboot was off and in setup mode...it seems to be sticking this time after re-enrolling, yet again. Not sure what was different.

crawfxrd commented 12 months ago

2023-09-08_42bf7a6 staged for release.

marcin-rzeznicki commented 12 months ago

Thanks, while we're at it - could you please tell your support not to convince people to ship their machines halfway around the world if they encounter similar problems, based on "hardware issues"? It would help others. Thank you in advance.

ilikenwf commented 12 months ago

I don't work for sys76. I'm just a customer.

On September 9, 2023 5:59:04 PM CDT, "Marcin Rzeźnicki" @.***> wrote:

Thanks, while we're at it - could you please tell your support not to convince people to ship their machines halfway around the world if they encounter similar problems, based on "hardware issues"? It would help others. Thank you in advance.

-- Reply to this email directly or view it on GitHub: https://github.com/system76/firmware-open/issues/437#issuecomment-1712657722 You are receiving this because you commented.

Message ID: @.***>

marcin-rzeznicki commented 12 months ago

I don't work for sys76. I'm just a customer.

Sorry, that message was meant for @crawfxrd

thomas-zimmerman commented 12 months ago

@marcin-rzeznicki Support team just spoke about this bug today. We also don't like to ship machines halfway around the world for something that can be fixed in place.

ilikenwf commented 12 months ago

I'm not really part of this particular issue but maybe you can train and provide programming tools to certified techs in a few other nations as contractors to fix such issues instead?

On September 11, 2023 10:38:11 AM CDT, thomas-zimmerman @.***> wrote:

@marcin-rzeznicki Support team just spoke about this bug today. We also don't like to ship machines halfway around the world for something that can be fixed in place.

-- Reply to this email directly or view it on GitHub: https://github.com/system76/firmware-open/issues/437#issuecomment-1714134891 You are receiving this because you commented.

Message ID: @.***>

techsy730 commented 11 months ago

Just reporting that I have been encountering the same thing on the 2023 model Bonobo (bonw15) on firmware version 2023-07-10_0e4a64a

Just in case this wasn't already known to affect this pair of model and firmware version.

marcin-rzeznicki commented 11 months ago

2023-09-08_42bf7a6 staged for release.

Any idea when it's going to be actually released?

marcin-rzeznicki commented 9 months ago

Ping @crawfxrd - it's been a while

crawfxrd commented 9 months ago

https://github.com/system76/firmware-open/issues/340

oryp9 and oryp10 have not been released yet.