litex-hub / linux-on-litex-rocket

Run 64-bit Linux on LiteX + RocketChip
BSD 2-Clause "Simplified" License
181 stars 18 forks source link

Debug system instabilities #23

Closed troibe closed 1 month ago

troibe commented 2 years ago

I want to use this project as a basis for running an application with Linux dependencies on Rocket.

First I tried getting https://github.com/tongchen126/Boot-Debian-On-Litex-Rocket project to run. I managed to boot to login and even successfully login a couple of times with both systemd and sysvinit but the majority of times it will just indeterministically hang at random points in or after the boot process (can be in either Kernel/systemd/sysvinit/after login). Where the hangs after the kernel has started are far more common. If the hang happens after the Kernel has started then the Kernel will usually still be active (it prints a message if the sdcard gets removed). In that case just the application (bash/systemd/sysvinit/other binaries) seems to become unresponsive.

Second I tried using the /bin/bash shell instead of systemd or sysvinit as they don't seem strictly necessary to run my application. Here I encounter the same issue. Sometimes my application finishes successfully, sometimes it just randomly hangs (but kernel is still active). Especially on long program runs the probability that the application will hang is very high.

I did run into this issue last year as well when I was using an older version of the system for benchmarking purposes but I didn't have the time investigate this further (so I would just restart until I get a successfull run).

So first of all I was wondering if you @gsomlo encountered instabilities on your Rocket systems. Second I was wondering if you have any pointers of things that I could look into. Online I mostly found advice for deterministic hangs in Linux.

For reference I'm using the Arty board and the latest version of Litex. I already tried different sdcards so I don't think that's an issue. Additionally I tried using different power supplies. I'll also try another FPGA when I get my hands on it. Overall I'm leaning more towards that it's probably a software/gateware issue since BBL and the kernel rarely hang.

gsomlo commented 2 years ago

On Tue, Mar 29, 2022 at 08:10:49AM -0700, developandplay wrote:

So first of all I was wondering if you @gsomlo encountered instabilities on your Rocket systems.

I haven't tried to load much beyond a busybox-based ram disk to test things like the ethernet and sd card drivers with the most recent linux kernel sources. I also ran a bunch of benchmarks (http://mirror.ini.cmu.edu/litex/benchrv64.tar) but still only in the busybox initrd.

I am planning on trying to boot Fedora on LiteX at some point in the future, which should be a bit less impossible now that both the ethernet and sdcard drivers are upstream in the kernel.

Under busybox, litex+linux+rocket has so far been rock solid for me. I tested on ecpix5 (85k) and trellisboard (yosys+trellis+nextpnr), and on the nexys4ddr, nexys-video, and genesys2 (vivado).

I don't have an Arty, so I can't say anything useful about whether that board model has anything to do with the issue you're observing.

It may be entirely possible that loading a "real" OS distro pushes the limits of the hardware much further, exposing issues that wouldn't come up under just busybox.

Another potential issue is memory: loading a real distro might run into OOM situations depending on how much "stuff" is started on boot.

But I can only speculate at this point...

gsomlo commented 1 month ago

instructions on how to boot fedora with linux-on-litex-rocket have been added as part of commit b9b92a5bf79334f100ad808df7827f1f326307c1. Closing, please follow-up or re-open if any problems remain un-addressed.