system76 / ec

System76 Open Source Embedded Controller
GNU General Public License v3.0
314 stars 71 forks source link

darp5: Not early-loading system76_acpi and/or attempting to use most system76_ectool commands crashes ec #472

Open Thulinma opened 2 weeks ago

Thulinma commented 2 weeks ago

Hey there! I know this is technically an unsupported model, and an old one at that, but figured I'd at least report this here. Ever since installing the open EC on my darp5, most of the system76_ectool commands never worked and/or would crash the EC (needing to disconnect the battery to restart it, as I found no other way to do so). I always figured this was just using "something" unsupported on my older laptop model, and didn't think much of it - I just didn't use the tool. These days I'm running EC commit fc3bad29a2a31555bccaaacb6af6d20b7bb1b7f6 - and in the last few Linux kernels (at least on 6.9.7) something seems to have changed that now causes the non-system76 ACPI kernel modules (I'm not sure which exactly - nor sure how I could find out) to do.... something that also crashes the EC in the same way. After some prodding, it looks like setting system76_acpi to load in early boot (by including it in the initrd image) prevents this crash from happening (likely because it prevents other ACPI drivers from probing and/or attempting to load..?).

Thankfully, I have a workaround (load the system76acpi kernel module early) that keeps things stable - but I kinda feel like it should not be possible to crash the EC (or at least not this trivially easy)... So I'm reporting it here. ^^

If there is anything I can do to help debug this, do let me know! I'm honestly a little confused that (apparently?) this doesn't affect other models and/or nobody else noticed/reported the problem. At least on my system, simply trying to run e.g. system76_ectool security with any argument will already cause an EC crash to happen.

Just to make clear what I mean with "EC crash", the symptoms are:

Again, happy to help any way I can! That said, since I've had to go through speedrunning opening my laptop several times now, if there are any ways to force an EC restart that are faster/easier than disconnecting the battery I'd love to be made aware of them, independent of a potential fix for this issue šŸ˜….

crawfxrd commented 2 weeks ago

Holding down the power button (even for several minutes uninterrupted) does not reboot the EC or shut down the laptop at all.

PWRSW WDT 2 was enabled on boards using IT5570E in #315.

Could try enabling it on IT8587E (#473).

// PWRSW WDT 2 Enable
GCR8 = BIT(4);

Bit 5 of GCR9 (PWSW2EN2 on IT5570E) is marked reserved in the datasheet, but based on our experience with these it might exist and be required.

crawfxrd commented 1 week ago

Do you happen to have a known working EC version?

Thulinma commented 1 week ago

Define "known working"? I have backups of every EC version I've ever kept installed for more than a few hours:

I have not yet had a chance to compile the modified version and see if it indeed lets me restart the EC by holding power - but am planning to do so sometime in the next few days. šŸ¤ž Also happy to try other things if it could help get to the bottom of this! (For now, at least, it seems my laptop runs stable as long as I don't use ectool... So day-to-day usable, which is the most important thing!)

crawfxrd commented 1 week ago

Define "known working"?

A version that doesn't crash.

ACPI interactions and EC commands should never trigger a crash. At worst, they should time out, but the EC should otherwise continue to operate normally.

Only 6 of the ported boards use IT8587E. darp6 is the only model that has an actual release with System76 EC (latest 2021-07-20_93c2809), and it's optional.

I have not yet had a chance to compile the modified version and see if it indeed lets me restart the EC by holding power

@leviport tested the WDT change on a darp5 and reported it was constantly being triggered. And after looking at the schematics, it obviously won't work: Clevo didn't use the dedicated power switch pin for the power switch.

leviport commented 1 week ago

I got my darp5 to reflash externally last night, and while I had the external flasher set up, I hopped between commits to see where the break happened. I found that https://github.com/system76/ec/commit/0f2ff7e54020069d9393453169cbbf2693d56c76 is the first one where it broke. The commit before it, https://github.com/system76/ec/commit/546458e3688a32723b0086fae066a1897c6c9c3a, seems to work fine.