tpwrules / nixos-apple-silicon

Resources to install NixOS bare metal on Apple Silicon Macs
MIT License
748 stars 74 forks source link

`nixos-install` crashes and results in boot looping #62

Closed chrisseto closed 1 year ago

chrisseto commented 1 year ago

First of all, thanks so much for this project. It's wonderful!

I had a system up and running on an m1 mini but something went wrong resulting in bootloops and needing to reset the device with Apple Configurator. I've since updated to the latest build and have been trying to reinstall but every time I run nixos-install, the machine freezes and reboots. After the initial attempt, it seems that subsequent attempts fail earlier and eventually lead to the installer freezing and rebooting without any additional action.

I'm curious to know if you've encountered similar issues or have any idea where to start debugging this issue? The only explanation I can think of is that somewhere in the installer, the firmware bundle gets corrupted.

I will say, I've built the installer in a bit of a strange way on macOS. So perhaps that's the issue? I would expect that I would not be able to get to nixos-install if that was the problem though. I made use of the darwin.builder module in nixpkgs which runs a aarch64-linux vm in qemu to run linux builds on macOS. I just had to run the build commands as nix build .#packages.aarch64-linux.installer-bootstrap -o installer instead of nix build .#installer-bootstrap.

Because the reboots happen unpredicably and on the installer, I haven't been able to pull logs just yet. I'll see if I can enable SSH in the installer and then capture logs as it's running. Other than that I would really appreciate any guidance on where to look for potential issues.

tpwrules commented 1 year ago

I have not heard of this issue before. If bootlooping started before the reinstall, then maybe the Mac is just faulty? But if it managed to survive a kernel compilation then that is a data point in the opposite direction. Maybe there is a difference with the newly-reconfigurated system firmware.

Please try the previous release (release-2023-02-23) and see if that helps.

You shouldn't need to build the ISO yourself, by the way. I take care to ensure they are reproducible (at least the cross-compiled ones). So you can just download the release ISO from GitHub and save some time.

chrisseto commented 1 year ago

I have not heard of this issue before. If bootlooping started before the reinstall, then maybe the Mac is just faulty? But if it managed to survive a kernel compilation then that is a data point in the opposite direction. Maybe there is a difference with the newly-reconfigurated system firmware.

That's certainly in the realm of possibility, the mini I have is a refurb. The issues don't seem pop up when it's running MacOS. Though the issues do seem to be non-deterministic, which would be another point in the category of "faulty Mac". If I keep hitting bedrock, I'll swing by an apple store to see how long it takes for them to kick me out.

I'll be a bit more thorough with documentation my observations on my next iterations. It seems like everything hums along just fine until I try to install nix the first time. That is to say, leaving the installer running without running nixos-install is fine until I run nixos-install.

Please try the previous release (release-2023-02-23) and see if that helps.

I've tried both actually. The first iteration was on 2023-02-23 and the most recent one was on 2023-03-21.

You shouldn't need to build the ISO yourself, by the way. I take care to ensure they are reproducible (at least the cross-compiled ones). So you can just download the release ISO from GitHub and save some time.

Oh! I didn't realize that. I'll give it a shot with both the newest and second most recent releases and report back. Hopefully, enabling ssh and tailing journalctl will provide a bit more insight.

tpwrules commented 1 year ago

I didn't address this in my previous reply but the firmware is only used for Wi-Fi. Are you installing over Wi-Fi?

Running nixos-install is where everything spins up to actually get the installation done. Network becomes active, disk becomes active, CPU becomes active, and all that. If there's a problem of course that is where it is going to happen. Maybe you can try to find benchmarks for these and run them on macOS and some Linux and see if anything happens. Ideally both individually and all together.

chrisseto commented 1 year ago

I didn't address this in my previous reply but the firmware is only used for Wi-Fi. Are you installing over Wi-Fi?

Nope, this is all over ethernet! At this point I'm fairly convinced this mini is a lemon. I spent the last few hours trying to restore or revive the mini and even that was running into weird issues. Well see what apple says. Closing this until I get my hands on another mini to break ;) Thanks for the help!

chrisseto commented 1 year ago

Closure! It was a hardware issue. Memory diagnostics would cause the machine to crash whenever they got run, even after a successful restore from the apple store. Seems that the machine never experienced enough load when running macOS to trigger the failure.