librecore-org / librecore

GNU General Public License v2.0
88 stars 10 forks source link

[x4x] Reboot requires a global reset, which should never happen #2

Closed ghost closed 7 years ago

ghost commented 7 years ago

On the ga-g41m-es2l, reboot requires a hard reset. This drops the PSU's 5V rail, while storage devices expect it to remain on. This is a bug in raminit code for x4x.

SSD controllers may lie about cache flushes and corrupt data. Hard disks will do emergency retracts, which may damage them eventually.

I have tested this with many different storage devices (SSD's and HDD's), and data corruption occurs only on some, but not all. Kingston SSDNow V300's fail the most.

ghost commented 7 years ago

Ideally, the 5V rail should never be dropped upon reboot. But currently raminit fails when we try this, even though it no longer needs a reset when cold booting.

Alternatively, disks should be spun down / parked upon reboot some way. When doing a shutdown, the kernel prints this: sd 0:0:0:0: [sda] Stopping disk If coreboot could mimic that, it would work around the issue, but not solve it.

ghost commented 7 years ago

The current workaround is adding a 1500ms delay prior to the reset in raminit, and hoping this is enough time for the disk to sync. There's not really any way of being sure because it all happens in proprietary disk firmware.

This is a dirty hack, but the alternative is huge filesystem errors.

ghost commented 7 years ago

Adding blocker label. We should at least add the 1500ms dirty hack to our coreboot tree, because data loss and file system wreckage is not something we want users to encounter.

zamaudio commented 7 years ago

Just to clarify, the exact issue is that currently, raminit issues a global reset on every boot to handle a bug in raminit that prevents regular boot from warm reset state. Currently, if an x4x board issues a warm reset, such as pressing the reset button on the case, or "reboot" in OS, upon the next coreboot boot, it hits the global reset which works around a bug that prevents ram initializing. If the global reset command is removed for warm resets, it fails to pass romstage. But the consequence of doing a global reset on every boot means that 5V rail is deasserted momentarily which hard resets hard disks. Bad for reboot...

ghost commented 7 years ago

@zamaudio Have you documented the current workaround somewhere? It would be nice if others could also implement it until we find a real fix.

For instance, I'd like to test the it8718f fixes on this board, but I don't want to lose the 1500ms delay.

ghost commented 7 years ago

With coreboot master, doing a cold boot triggers a power cycle. The system turns on, quickly turns off, and then turns on again.

With the 'g41m-coreboot-rebootfix2.rom', this behaviour is not present. The system boots normally at the first attempt, without shutting down in between.

I'm not sure what you changed, but that was already an improvement regardless of the reboot situation.

ArthurHeymans commented 7 years ago

Maybe early_reset.c from gm45 can be reused for this?

ghost commented 7 years ago

Abandoned, but related: https://review.coreboot.org/#/c/17624/

ArthurHeymans commented 7 years ago

Raminit still requires to start from cold boot to succeed, but with this fix: https://review.coreboot.org/#/c/17659/ it only resets when needed. It also adds a delay so that it does not corrupt disks

ghost commented 7 years ago

Properly fixed now: https://review.coreboot.org/#/c/18009/

zamaudio commented 7 years ago

Awesome!