litex-hub / linux-on-litex-rocket

Run 64-bit Linux on LiteX + RocketChip
BSD 2-Clause "Simplified" License
181 stars 18 forks source link

Unable to boot Linux - userspace cannot write to console, no busybox shell #29

Closed n-kremeris closed 1 year ago

n-kremeris commented 1 year ago

I'm building LiteX SoC with Rocket on a digilent nexys video board, which is listed as one of the supported platforms here:

./litex-boards/litex_boards/targets/digilent_nexys_video.py --sys-clk-freq 50e6 --with-ethernet --with-sdcard --cpu-type rocket --cpu-variant fulld --csr-json ./newdts.json --build

I have so far built linux with both the litex-rebase, and the master branches on the litex-hub/linux repository. My linux config requires the CONFIG_RISCV_SBI_V01=y option, otherwise it hangs as described here around the loop: module loaded debug message (long before starting init)

With the sbi_v01 option enabled, i manage to get it to boot until the following is seen:

[    0.093898] Run /init as init process
[    0.093923]   with arguments:
[    0.093953]     /init
[    0.093975]   with environment:
[    0.094006]     HOME=/
[    0.094030]     TERM=linux

When using the pre-built initramfs.cpio from the releases here.

I am sure that the init does actually start running, which i tested by creating a custom init executable that segfaults:

#include <stdlib.h>
#include <stdio.h>

#define ERROR(x) { int *tmp = (int*)x; *tmp = 0xDEADBEEF; }

int main() {
    printf("--------------------------- Hello, world!-------------------- \n");
    system("echo \"$$$$$$ HELLO WORLD FROM system()\" >/dev/kmsg");

    ERROR(0xffff0000ffff0000);
}
[    0.093893] Run /init as init process
[    0.093919]   with arguments:
[    0.093948]     /init
[    0.093970]   with environment:
[    0.094002]     HOME=/
[    0.094025]     TERM=linux
[    0.094406] init[1]: unhandled signal 11 code 0x1 at 0xffffff80ffff0000 in init[10000+62000]
[    0.094484] CPU: 0 PID: 1 Comm: init Not tainted 6.3.0-rc3-g640cc8df93a6 #55
[    0.094554] epc : 0000000000010664 ra : 000000000001064a sp : 0000003fcacb7c10
[    0.094627]  gp : 0000000000078bb8 tp : 000000000007e7e0 t0 : 0000000000000002
[    0.094699]  t1 : 0000000000009000 t2 : 000000006fffff41 s0 : 0000003fcacb7c30
[    0.094771]  s1 : 0000000000000001 a0 : 0000000000007f00 a1 : 0000003fcacb78d0
[    0.094843]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffffdeadbeef
[    0.094915]  a5 : ffff0000ffff0000 a6 : 0000000000074e98 a7 : 0000000000000087
[    0.094987]  s2 : 0000003fcacb7dd8 s3 : 0000000000000001 s4 : 0000003fcacb7de8
[    0.095059]  s5 : 000000000001062a s6 : 0000000000000001 s7 : 0000000000000001
[    0.095131]  s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
[    0.095203]  s11: 0000000000000000 t3 : 0000003f836b3000 t4 : 0000000000000011
[    0.095275]  t5 : 0000000000000000 t6 : 0000000000072d80
[    0.095328] status: 8000000200006020 badaddr: ffffff80ffff0000 cause: 000000000000000f
[    0.095412] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.095484] CPU: 0 PID: 1 Comm: init Not tainted 6.3.0-rc3-g640cc8df93a6 #55
[    0.095554] Call Trace:
[    0.095578] [<ffffffff800051b8>] dump_backtrace+0x1c/0x24
[    0.095632] [<ffffffff804c86bc>] show_stack+0x2c/0x38
[    0.095683] [<ffffffff804d0104>] dump_stack_lvl+0x3c/0x54
[    0.095736] [<ffffffff804d0130>] dump_stack+0x14/0x1c
[    0.095787] [<ffffffff804c87c4>] panic+0xfc/0x284
[    0.095834] [<ffffffff80010b4a>] do_exit+0x704/0x70a
[    0.095883] [<ffffffff80010cbe>] do_group_exit+0x24/0x6c
[    0.095936] [<ffffffff8001ba76>] get_signal+0x7e4/0x814
[    0.095988] [<ffffffff80004486>] do_work_pending+0xfa/0x422
[    0.096044] [<ffffffff800032c2>] resume_userspace_slow+0x8/0xa
[    0.096103] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

I have tried device trees from both the linux-on-litex-rocket repo, and by making my own using csr_json argument and then json2dts. I have attached the dts i'm currently using to the post.

I have also tried this flow with both linuxd + bbl with rv64imac, and fulld = bbl with rv64imafdc, observing the same issues.

I have additionally tried various permutations of the console= parameter passed to the kernel, but only console=liteuart0 seems to be working correctly, other options such as liteuart, sbi, or ttyLXU0 either result in warning: unable to open initial console errors, or a warning stating that the console argument is malformed during very early stages of the boot.

I have attached the full kernel log too. Thanks

log.txt mydts.txt

gsomlo commented 1 year ago

On Wed, Mar 22, 2023 at 08:45:41AM -0700, Norbert Kremeris wrote:

I'm building LiteX SoC with Rocket on a digilent nexys video board, which is listed as one of the supported platforms here.

If you're using the pre-built binaries for software, chances are that MMIO registers placement has changed in more recent gateware builds.

If you're putting together your own software, make sure your devicetree .dts matches the way your gateware was built (use --csr-csv ./csr-csv to ensure that information is captured during build).

HTH, --Gabriel

PS. please be warned that cpu variant naming is about to be changed pretty significantly, which will make the "linux-on-litex-rocket/README.md" (even more) obsolete, and in need of an update (next on my list, I promise :) )

n-kremeris commented 1 year ago

Hi Gabriel! Sorry, i accidentally posted the issue without writing all the info and then frantically started editing it. Didn't expect such a quick reply!

Please take a look again at the updated info when you have the time. I am indeed using the generated (but slightly modified) dts.

I am using my own build of linux, bbl, and tried a custom build of busybox/initramfs.cpio as well as the prebuilt one.

roryt12 commented 1 year ago

Sounds like the issues I had with the irq of liteuart? Take a look at https://github.com/roryt12/qmtech_wukong_debian_on_litex_rocket . It hanged exactly at this point if I enable irq vs pooling?

Also try an alternative , ie the hvc0 driver, at my try with naxvriscv https://github.com/roryt12/qmtech_wukong_debian_on_litex_rocket seems to work fine (needs more options in kernel and in dts )

gsomlo commented 1 year ago

I haven't built anything using BBL in a very long time, having been discouraged from doing so by upstream who now favor opensbi. The latter doesn't support emulating the FPU in M-mode, so it only works on boards that can accomodate at least one FPU-enabled Rocket core (85k ecp5, so no versa board; nexys* boards and better xilinx-equipped ones should still work).

I need to update the README to reflect this, but I'm also trying to get upstream LiteX to generate the full *.dts for a rocket-based system automatically, so I don't have to screw around with sample device tree files anymore :)

In the mean time, if you have a nexys_video, I'd like to propose that you try the following and let me know how it goes:

On the very latest upstream litex, add #define CONFIG_BIOS_NO_BOOT to the top of litex/soc/software/bios/main.c to inhibit it from automatically trying to boot from sdcard (you can still manually issue whatever boot command you want from the LiteX bios prompt). Then, build for the nexys_video:

litex-boards/litex_boards/targets/digilent_nexys_video.py --build \
    --cpu-type rocket --cpu-variant full --cpu-num-cores 2 --cpu-mem-width 2 --sys-clk-freq 50e6 \
    --with-ethernet --with-sdcard \
    --with-sata --sata-gen 1 --with-sata-pll-refclk \
    --csr-csv ./csr.csv

It's OK if you don't have a SATA adapter, but building it in anyway will make the MMIO registers line up properly with the DTS sample I'm about to link you to :)

Build the litex-rebase branch of the litex-hub kernel tree:

make clean
make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu-  litex_rocket_defconfig
make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- -j8

This gets you an arch/riscv/boot/Image kernel file.

Unpack nexys_video_binaries.zip There will be a copy of Image already in there, pre-built. Feel free to use it or ignore it if you built your own. There's also a satatest4imafdc.dts sample in there. Compile it with:

dtc -O dtb satatest4imafdc.dts -o satatest4imafdc.dtb

Then build fw_jump.bin from source (or use the included one), like so:

make CROSS_COMPILE=riscv64-unknown-linux-gnu- PLATFORM=generic \
    FW_FDT_PATH=~/...path/to/.../satatest2imafdcbkph.dtb FW_JUMP_FDT_ADDR=0x82400000

Use the included initrd_bb (which should not be significantly different from previous sets of instructions).

Place the included boot.json, initrd_bb, Image, and fw_jump.bin into a tftp server directory (or on your FAT formatted sdcard), and issue the appropriate boot command (netboot or sdcardboot) from the LiteX bios.

LMK if this works for you (it does for me, and it's how things will be done from now on -- no more BBL).

All of this is also perfectly capable to boot e.g. Fedora (with an appropriately modified DTS and some doctoring of the official raw image to get it installed to the sdcard :) ).

LMK how it goes -- good luck!

gsomlo commented 1 year ago

note the edited --cpu-variant linux --cpu-num-cores 4 --cpu-mem-width 2 command line. If you go for full (now with [H]ypervisor support), there's only room for 2 such cores, and I gave you the wrong DTS for that :)

gsomlo commented 1 year ago

final edit: use --cpu-type rocket --cpu-variant full --cpu-num-cores 2 to get H support, and be able to use the enclosed binary opensbi blob (updated the source DTS to reflect the two larger h-capable full cores vs. the previous 4 lighter linux ones). The updated/edited commend containing build instructions is now correct and should reference the correct zip file.

n-kremeris commented 1 year ago

Hi @gsomlo!

Thanks very much for the detailed updated instructions. I have successfully managed to boot linux on the digilent nexys video by following your comments, with slight modifications to suit my needs!

I have had some weird issues with my litex_setup.py - doing "--update" failed to pull the latest pythondata-cpu-rocket versions so the required configuration did not exist! To remedy, i had to delete everything related to litex, clone a fresh copy of litex and re-run litex_setup.py --install --size full --user.

I have built the bitstream using ./litex-boards/litex_boards/targets/digilent_nexys_video.py --build --cpu-type rocket --cpu-variant full --cpu-num-cores 1 --cpu-mem-width 2 --sys-clk-freq 50e6 --with-ethernet --with-sdcard --with-sata --sata-gen 1 --with-sata-pll-refclk --csr-csv ./csr.csv

I have additionally modified the provided dts to only have 1 cpu core, as well as update the initramfs end address to fit my new bigger busybox image (initrd end = initrd_start + sizeof(initramfs file) + 12814)

linux,initrd-end = <0x8220320E>; /* end initrd.gz + 12814 (?) bytes */ otherwise, the boot gets stuck at Waiting for root device /dev/ram0...

I have built busybox at tag 1_36_0, and for anyone else looking to rebuild the initramfs/initrd, here's how i did it (mostly matches the original instructions)

#!/bin/bash
rm -rf initramfs
rm -rf initramfs.cpio
mkdir initramfs
pushd initramfs
mkdir -p bin sbin lib etc dev home proc sys tmp mnt nfs root \
          usr/bin usr/sbin usr/lib
sudo mknod -m 622 dev/console c 5 1
sudo mknod -m 622 dev/tty0 c 4 0
cp ../busybox_git/busybox bin/
ln -s bin/busybox ./init
cat > etc/inittab <<- "EOT"
::sysinit:/bin/busybox mount -t proc proc /proc
::sysinit:/bin/busybox mount -t devtmpfs devtmpfs /dev
::sysinit:/bin/busybox mount -t tmpfs tmpfs /tmp
::sysinit:/bin/busybox mount -t sysfs sysfs /sys
::sysinit:/bin/busybox --install -s
/dev/console::sysinit:-/bin/ash
EOT
fakeroot <<- "EOT"
find . | cpio -H newc -o > ../initramfs.cpio
EOT
popd

Then copy the initramfs.cpio to tftp dir as initrd_bb.

Some extra details incase anyone is looking to replicate this:

riscv64-linux-gnu-gcc (Ubuntu 12.1.0-2ubuntu1~22.04) 12.1.0
riscv64-unknown-elf-gcc (g1ea978e3066) 12.1.0
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
opensbi @ c6a092cd80112529cb2e92e180767ff5341b22a3
litex-hub/linux (litex-rebase) @ b73e060b3b04cba84983a3786845a6d16c77bf1f

EDIT: perhaps you'd like for me to try update the readme to match current status?