Closed cjearls closed 2 years ago
The rocket core is a few months old, but I don't think that would have anything to do with ability to support Fedora or Debian. Its freshness can be assessed by looking at the commit log of the https://github.com/litex-hub/pythondata-cpu-rocket repo (specifically, commits mentioning "update to chipsalliance/rocket-chip commit #xxx").
So far, the way I've been going about trying to boot Fedora is as follows (EDIT: added more detailed instructions and downloadable links):
root
and user
account, both using password riscv
, as shown on the console right before the login prompt):
var/lib/libvirt/images/
/var/lib/libvirt/images/fv64gc/fv64gc.xml
using virsh
and/or virt-manager
/var/lib/libvirt/images/fv64gc/bbl-5.1.0-0.rc1.git0.1.1.riscv64.fc31.riscv64
is the provided BBL-wrapped kernel image/var/lib/libvirt/images/fv64gc/initramfs-5.2.0-0.rc7.git0.1.0.riscv64.fc31.riscv64.img
is the initrd filesystem (also available on the "hard drive" image)/var/lib/libvirt/images/fv64gc/Fedora-Developer-Rawhide-20190703.n.0-sda.raw
is the HDD image, containing /
and /boot
partitions.Build the provided initrd image into our LiteX-specific linux kernel (litex-rebase
branch):
cp /var/lib/libvirt/images/fv64gc/initramfs*.img linux/initramfs.cpio.gz
# build the kernel:
pushd linux
gunzip initramfs.cpio.gz
make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- \
litex_rocket_defconfig litex_rocket_initramfs.config
make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- -j3
popd
# modify DTS bootargs line and set `bootargs` to the string shown below:
vi linux-on-litex-rocket/conf/nexys4ddr.dts
bootargs = "earlycon=sbi console=hvc0 swiotlb=noforce ro root=/dev/mmcblk0p2 rootfstype=ext2 fsck.mode=skip ignore_loglevel plymout.enable.no systemd.log_level=debug";
# build BBL:
pushd riscv-pk/build
../configure --host=riscv64-unknown-linux-gnu \
--with-arch=rv64imac \
--with-payload=../../linux/vmlinux \
--with-dts=../../conf/nexys4ddr.dts \
--enable-logo
make bbl
riscv64-unknown-linux-gnu-objcopy -O binary bbl ~/boot.bin
popd
initramfs.cpio.gz
; boot.bin
(for the digilent nexys4ddr board)."Rip" the rv64gc-Fedora root filesystem:
# make a copy of the raw image (it will be modified in the process):
cp /var/lib/libvirt/images/fv64gc/*.raw .
# mount the second partition from the image via loopback:
losetup -f -P Fedora-Developer-Rawhide-20190703.n.0-sda.raw
mount /dev/loop0p2 /mnt
# modify existing fstab entries' device, FS type, and timeout values as shown below:
vi /mnt/etc/fstab
/dev/mmcblk0p2 / ext2 defaults,noatime,x-systemd.device-timeout=0,x-systemd.mount-timeout=0 0 0
/dev/mmcblk0p1 /boot msdos defaults,noatime,x-systemd.device-timeout=0,x-systemd.mount-timeout=0 0 0
# archive all files:
(cd /mnt; find . | cpio -H newc -o | xz > ~/mmcblk0p2.ext2.cpio.xz)
# unmount and disconnect image from loopback:
umount /mnt
losetup -d /dev/loopX
mmcblk0p2.ext2.cpio.xz
Prepare the microSD card (e.g., /dev/sdd
):
# fdisk /dev/sdd
Welcome to fdisk (util-linux 2.35.2).
Changes will remain in memory, until you decide to write them.
Be careful before using the write command.
Command (m for help): p
Disk /dev/sdd: 29.74 GiB, 31914983424 bytes, 62333952 sectors
Disk model: SD/MMC
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x67f480f9
Device Boot Start End Sectors Size Id Type
/dev/sdd1 2048 2099199 2097152 1G 6 FAT16
/dev/sdd2 2099200 62333951 60234752 28.7G 83 Linux
# mkdosfs /dev/sdd1
# mount /dev/sdd1 /mnt
# cp boot.bin /mnt/
# umount /mnt
# mkfs.ext2 /dev/sdd2
# mount /dev/sdd2 /mnt
# (cd /mnt; xzcat ~/mmcblk0p2.ext2.cpio.xz | cpio -id)
# umount /mnt
sdcardboot
Right now, it gets stuck here for me:
So, any ideas on how to get past "a start job is waiting for /dev/mmcblk0p2" would be super helpful.
LMK what you think.
I don't know what task it could be waiting for, but I have a few ideas for debugging or fixing the problem in a few different directions, so here they are in no particular order:
Rocket appears to support debugging over JTAG, so we might be able to get a better idea of what's going on when it hangs if we enable debugging, resynthesize Rocket, and figure out where execution is when the boot hangs.
It seems like newer versions of Fedora use OpenSBI, which from this webpage about requirements (https://review.coreboot.org/plugins/gitiles/opensbi/+/HEAD/docs/platform_requirements.md) appears to support trapping and handling floating point. It may be that whatever issue is happening was solved in a newer release, so the answer might be as simple as getting set up to use OpenSBI and adding support for newer versions of Fedora. I'm not sure exactly all that would go into switching from BBL, though, so maybe this isn't feasible.
We could try simulating the LiteX to see if the bug still occurs. It might take a few days to get to the same point in the boot process, but if the bug doesn't occur when simulating, it would point to this being a hardware problem, which would narrow down the places we have to search for bugs significantly
What do you think? I'm still learning a lot about many of these projects, so I'm sure if any of these were as simple as "just do X" and it's fixed, it would be working by now.
- Rocket appears to support debugging over JTAG, so we might be able to get a better idea of what's going on when it hangs if we enable debugging, resynthesize Rocket, and figure out where execution is when the boot hangs.
Rigging up a LiteX+Rocket system to a debugger and stepping through CPU instructions would be interesting and useful in general, though not sure it's the right level of detail for this particular problem. Somehow bumping the systemd debug output level to where it would tell us what it's really waiting for might be faster in this particular case.
- It seems like newer versions of Fedora use OpenSBI, which from this webpage about requirements (https://review.coreboot.org/plugins/gitiles/opensbi/+/HEAD/docs/platform_requirements.md) appears to support trapping and handling floating point.
Actually, I thought the opposite was true: https://github.com/riscv/opensbi/issues/148#issuecomment-568916676 BBL does handle FPU emulation, which is one reason why I was trying to stick with it (for now). Adopting opensbi would either rule out the "smaller" FPGAs (e.g., ecp5) or would require someone to get its maintainers to revisit their decision w.r.t. FPU emulation.
- We could try simulating the LiteX to see if the bug still occurs. It might take a few days to get to the same point in the boot process, but if the bug doesn't occur when simulating, it would point to this being a hardware problem, which would narrow down the places we have to search for bugs significantly
I don't know the status of LiteSDCard simulation -- could one simulate a SDCard populated with 16GB worth of Fedora filesystems? I do remember that last time I tried (a few years ago) it took about 9 hours to boot a 64-bit kernel on rocket in Verilator :)
I'm a bit short on spare time ATM, but I'll try to sanitize and publish my rv64gc Fedora VM disk image, so you should be able to at least catch up to where it gets stuck for me. Then, maybe one of us can dive into the systemd voodoo and figure out what's really getting stuck...
Adding opensbi support for LiteX (in general, and LiteX+Rocket in particular) is another worthy long term project, although not on my personal todo list. But it is indeed the way Fedora is headed, so it would have to get sorted out eventually in any case...
- Rocket appears to support debugging over JTAG, so we might be able to get a better idea of what's going on when it hangs if we enable debugging, resynthesize Rocket, and figure out where execution is when the boot hangs.
Rigging up a LiteX+Rocket system to a debugger and stepping through CPU instructions would be interesting and useful in general, though not sure it's the right level of detail for this particular problem. Somehow bumping the systemd debug output level to where it would tell us what it's really waiting for might be faster in this particular case.
I see. I agree, it could be helpful in general, but it makes sense that it might be a little too low-level of a solution.
Actually, I thought the opposite was true: riscv/opensbi#148 (comment) BBL does handle FPU emulation, which is one reason why I was trying to stick with it (for now). Adopting opensbi would either rule out the "smaller" FPGAs (e.g., ecp5) or would require someone to get its maintainers to revisit their decision w.r.t. FPU emulation.
The comment here is specifically referring to emulating the atomics, which OpenSBI needs to operate correctly, so it can't emulate that extension. I got this link directly from the OpenSBI Github https://review.coreboot.org/plugins/gitiles/opensbi/+/HEAD/docs/platform_requirements.md, and it says "The base RISC-V platform requirements for OpenSBI are [...] At least rv32ima or rv64ima required on all HARTs [...] The RISC-V extensions not covered by rv32ima or rv64ima are optional for OpenSBI. Although, OpenSBI will detect and handle some of these optional RISC-V extensions at runtime.
The optional RISC-V extensions handled by OpenSBI at runtime are:
D-extension: Double precision floating point F-extension: Single precision floating point H-extension: Hypervisor" This seems to suggest that the F and D extensions would be automatically handled by OpenSBI
- We could try simulating the LiteX to see if the bug still occurs. It might take a few days to get to the same point in the boot process, but if the bug doesn't occur when simulating, it would point to this being a hardware problem, which would narrow down the places we have to search for bugs significantly
I don't know the status of LiteSDCard simulation -- could one simulate a SDCard populated with 16GB worth of Fedora filesystems? I do remember that last time I tried (a few years ago) it took about 9 hours to boot a 64-bit kernel on rocket in Verilator :)
I'm a bit short on spare time ATM, but I'll try to sanitize and publish my rv64gc Fedora VM disk image, so you should be able to at least catch up to where it gets stuck for me. Then, maybe one of us can dive into the systemd voodoo and figure out what's really getting stuck...
Sounds good. There's no rush, my semester finishes up over the next 2-3 weeks, so my spare time is partially constrained until then as well.
Adding opensbi support for LiteX (in general, and LiteX+Rocket in particular) is another worthy long term project, although not on my personal todo list. But it is indeed the way Fedora is headed, so it would have to get sorted out eventually in any case...
I think OpenSBI is already supported for Linux-on-Litex-VexRiscv, they have prebuilt bitstreams of it, and when I run Linux-on-LiteX-VexRiscv on my OrangeCrab board, it displays this before booting: "--============= Liftoff! ===============--
OpenSBI v0.8-1-gecf7701
/ \ / __ | _ | _ __ | (___ | _) | ' \ / \ '_ \ ___ \ | _ < | __ | _) | __/ | ____) | _) | _ ____/ | ./ _ | _ | _ | _____/ | __/___ | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
_ |
Platform Name : LiteX / VexRiscv-SMP Platform Features : timer,mfdeleg Platform HART Count : 8 Boot HART ID : 0 Boot HART ISA : rv32imas BOOT HART Features : time BOOT HART PMP Count : 0 Firmware Base : 0x40f00000 Firmware Size : 124 KB Runtime SBI Version : 0.2 "
So, any ideas on how to get past "a start job is waiting for /dev/mmcblk0p2" would be super helpful.
I've seen issues similar to this caused by kernel configuration incompatible with udev/systemd. Comparing https://github.com/litex-hub/linux/blob/litex-rebase/arch/riscv/configs/litex_rocket_defconfig with https://github.com/systemd/systemd/blob/main/README I see:
CONFIG_SYSFS_DEPRECATED
is y
but should be n
The rest looks fine on first glance.
CONFIG_SYSFS_DEPRECATED
isy
but should ben
Thanks @jluebbe -- I tried turning that off and it's still mostly the same behavior. Now, there's a time limit on waiting for /dev/mmcblk0p2
(there sometimes was one before, not sure how it stochastically picks between no limit
and 1min 30s
) and the boot process ends up like this:
...
[ OK ] Started Journal Service.
Starting Create Volatile Files and Directories...
[ OK ] Started udev Kernel Device Manager.
Starting udev Coldplug all Devices...
[ 133.493104] systemd-journald[40]: Successfully sent stream file descriptor to service manager.
[ *** ] (2 of 3) A start job is running for /dev/mmcblk0p2 (21s / 1min 30s)
[ TIME ] Timed out waiting for device /dev/mmcblk0p2.
[DEPEND] Dependency failed for Initrd Root Device.
[DEPEND] Dependency failed for /sysroot.
[DEPEND] Dependency failed for Initrd Root File System.
[DEPEND] Dependency failed for Relo…figuration from the Real Root.
[ OK ] Reached target Initrd File Systems.
[ 213.090818] systemd-journald[40]: Sent WATCHDOG=1 notification.
Starting Setup Virtual Console...
[ OK ] Reached target Paths.
[ OK ] Reached target Remote File Systems (Pre).
[ OK ] Reached target Remote File Systems.
[ 214.653112] systemd-journald[40]: Successfully sent stream file descriptor to service manager.
[ OK ] Started Setup Virtual Console.
[ OK ] Started Emergency Shell.
[ OK ] Reached target Emergency Mode.
[ 311.781778] systemd-journald[40]: Sent WATCHDOG=1 notification.
[ 315.693123] systemd-journald[40]: Successfully sent stream file descriptor to service manager.
[ 321.113122] systemd-journald[40]: Successfully sent stream file descriptor to service manager.
[ 322.911197] systemd-journald[40]: Data hash table of /run/log/journal/3800c492b882414397814a5b17d8b631/system.journal has a fill level at 75.1 (1537 of 2047 items, 745472 file size, 485 bytes per hash table item), suggesting rotation.
[ 323.033213] systemd-journald[40]: /run/log/journal/3800c492b882414397814a5b17d8b631/system.journal: Journal header limits reached or header out-of-date, rotating.
[ 323.040566] systemd-journald[40]: Rotating...
[ 323.106898] systemd-journald[40]: Journal effective settings seal=no compress=yes compress_threshold_bytes=512B
[ 323.384536] systemd-journald[40]: Reserving 2047 entries in hash table.
[ 323.392181] systemd-journald[40]: Vacuuming...
[ 323.466821] systemd-journald[40]: Vacuuming done, freed 0B of archived journals from /run/log/journal/3800c492b882414397814a5b17d8b631.
[ 393.123104] systemd-journald[40]: Sent WATCHDOG=1 notification.
[ 491.781609] systemd-journald[40]: Sent WATCHDOG=1 notification.
[ 578.973101] systemd-journald[40]: Sent WATCHDOG=1 notification.
[ 671.781650] systemd-journald[40]: Sent WATCHDOG=1 notification.
...
with endless Sent WATCHDOG
output, and no emergency shell.
I tried manually creating /dev/mmcblk0[p1|p2]
nodes in the initrd cpio before embedding it into the kernel and BBL, but that didn't seem to help either. Not sure what it's "waiting" for, really... Systemd is not the most intuitive thing to debug, once one leaves the "beaten path" :)
Creating device nodes in the initrd will not help. Systemd will mount devtmpfs to /dev right at the beginning. And waiting for devices does not mean waiting for the device node in /dev. It means waiting for the uevent. If the device is already there when udev starts then coldplug should trigger that uevent.
And it looks like the uevents never show up. In the past, this was often caused by a missing CONFIG_FHANDLE=y
in the kernel config. But that's enabled by default in recent kernels.
The other odd thing is, that the Emergency Shell
is starting, but apparently not in the tty that you're using. Are there multiple console=..
arguments in the kernel command-line? I think the shell will only start on one of them.
And it looks like the uevents never show up. In the past, this was often caused by a missing
CONFIG_FHANDLE=y
in the kernel config. But that's enabled by default in recent kernels.
Yeah, I can find CONFIG_FHANDLE=y
in my kernel .config
...
The other odd thing is, that the
Emergency Shell
is starting, but apparently not in the tty that you're using. Are there multipleconsole=..
arguments in the kernel command-line? I think the shell will only start on one of them.
My bootargs
looks like this:
bootargs = "earlycon=sbi console=hvc0 swiotlb=noforce root=/dev/mmcblk0p2 rootfstype=ext2 fsck.mode=skip ignore_loglevel plymouth.enable=no systemd.log_level=debug";
Interaction with the uart currently occurs over ecalls into machine mode, where BBL takes care of it. I know there's a LiteX uart driver in linux proper, not sure if I actually need to figure out how to use it directly for this to work...
for anyone interested, I edited https://github.com/litex-hub/linux-on-litex-rocket/issues/10#issuecomment-825276485 with a comprehensive list of steps going from a functional, standard-issue QEMU rv64gc VM to wherever I got stuck trying to get it booted on LiteX. I think any further progres from this point is down to systemd voodoo... :)
I've had no end of trouble running a basic Yocto on a Litex/VexRiscv where buildroot is fine using the same (Yocto-compiled) kernel, including after adding a lot of locally compiled package. Indeed there could be some bad interactions between the devices in Litex and systemd...
For opensbi I had managed to compile a 64-bits variant of the Litex/VexRiscv to use with Rocket, but it didn't work. I'm attaching the patch in case it could be useful as a starting point (based on the litex opensbi) opensbi_litexrocket.txt
It might be worthwhile to check out https://github.com/firesim/FireMarshal, it appears to generate working RISCV Linux distributions
Hello, I have tested debian on rocket using nexys video and a rootfs is uploaded to the release page, which uses sysvinit instead of systemd.
The link is here.
Also interested in your Fedora rootfs, Will give it a try in the future.
Thank you @tongchen126 this is really impressive! How long did it take you to figure everything out? Especially the part of using sysvinit instead of systemd... @gsomlo Maybe we can integrate it in this repository if you can reproduce it.
@developandplay Yeah, take me quite some time to make it work. Anyway, I wish the RCU stall and kernel oops when using systemd can be fixed in the future.
I think I have some progress on this. Please check my steps here
https://github.com/roryt12/Riscv64-Debian-qmtech-wukong-FPGA
On Fri, Sep 16, 2022 at 03:58:36AM -0700, Ioannis Ioannou wrote:
I think I have some progress on this. Please check my steps here
https://github.com/roryt12/Riscv64-Debian-qmtech-wukong-FPGA
Looking at it right now. For step 2 (picking the width of your rocket chip to match the memory bus): see
https://github.com/enjoy-digital/litex/blob/master/litex/soc/integration/soc.py#L1512
If you print out the value of port.data_width
there, you can then
pick a rocket variant that matches. There used to be a message
emitted by the litex builder stating whether that was a match or
whether alternatively a width conversion was needed, looks like it
was edited out since last time I needed to know :)
HTH, --Gabriel
Gabriel that was very helpful, indeed, port.data_width gave me 128 ! Thank you. Maybe will be helpful for others to add an INFO message there for the feature ?
On Fri, Sep 16, 2022 at 05:04:50AM -0700, Ioannis Ioannou wrote:
Gabriel that was very helpful, indeed, port.data_width gave me 128 ! Thank you. Maybe will be helpful for others to add an INFO message there for the feature ?
There used to be one there, it's what I'm talking about in the linux-on-litex-rocket README :)
It must have gotten patched out during some code refactoring while I wasn't paying attention...
Bummer! After one day of flawless operation, I got RCU stalls again when I enabled sshd and logged in with 2 additional sessions. I used rcupdate.rcu_cpu_stall_suppress=1 in the boot args, and now it runs smoothly.
On Fri, Sep 16, 2022 at 06:55:05AM -0700, Ioannis Ioannou wrote:
Bummer! After one day of flawless operation, I got RCU stalls again when I enabled sshd and logged in with 2 additional sessions. I used rcupdate.rcu_cpu_stall_suppress=1 in the boot args, and now it runs smoothly.
huh... I'm not exactly sure what crashes during my attempt to boot fedora, but this rcu business might be worth a shot in the dark.
I haven't had spare cycles to actually do a systematic troubleshooting session -- just "try random crap, fire-and-forget, and focus on $DAYJOB firefighting while it's trying to boot in the background" :)
Looking forward to things settling down a bit so maybe I can pay a bit of actual attention to it... :)
Cheers, --Gabriel
After some hacking on the linux drivers/tty/serial/liteuart.c
driver, I got it to go from crashing before/during login on fedora to actually working:
At this point, I believe building LiteX + Rocket on any board that can accomodate a "full" variant (i.e., one which implements an FPU in gateware) should support running Fedora or any other rv64gc
distro once the liteuart
linux driver is fixed upstream and the fix makes its way into a distro kernel. I'm currently working on that (slowly).
Hello,
I'd like to contribute, and I'm wondering what is currently preventing the RISCV Fedora or Debian ports from working on Rocket. Are there any known issues that prevent it, or does it just fail at some point in the booting process? How recent is the Rocket core being used? Is there anything that needs to be done that I can help with?