litex-hub / linux-on-litex-rocket

Run 64-bit Linux on LiteX + RocketChip
BSD 2-Clause "Simplified" License
181 stars 18 forks source link

Unable to boot into linux on simulated SoC - Kernel panic #31

Closed OrkunAliOzkan closed 1 year ago

OrkunAliOzkan commented 1 year ago

Hello there!

I'm building LiteX SoC with a single rocket core on litex_sim using self made dependencies.

The steps taken to build my dependencies are the following:

  1. generate the csr file using litex_sims support
  2. Convert the csr into a dt file
  3. Convert the dt file into a device tree blob file
  4. Build busybox
  5. Create kernel root ram filesystem
  6. Build the kernel
  7. Build fw_jump
  8. Run the simulator (preloading these dependencies)

Please find attached bellow the console log bellow, so you can see how far into the boot I reach: consolelogkernelpanic.txt

The notable lines of reference however are the following:

[    5.581115] Freeing unused kernel image (initmem) memory: 2816K
[    5.583210] Run /init as init process
[    5.583693]   with arguments:
[    5.583965]     /init
[    5.584206]   with environment:
[    5.584455]     HOME=/
[    5.584687]     TERM=linux
[    5.594537] Failed to execute /init (error -2)
[    5.595162] Run /sbin/init as init process
[    5.595487]   with arguments:
[    5.595737]     /sbin/init
[    5.595986]   with environment:
[    5.596214]     HOME=/
[    5.596461]     TERM=linux
[    5.602636] Run /etc/init as init process
[    5.603198]   with arguments:
[    5.603467]     /etc/init
[    5.603698]   with environment:
[    5.603945]     HOME=/
[    5.604176]     TERM=linux
[    5.609673] Run /bin/init as init process
[    5.610292]   with arguments:
[    5.610574]     /bin/init
[    5.610827]   with environment:
[    5.612462]     HOME=/
[    5.612899]     TERM=linux
[    5.618560] Run /bin/sh as init process
[    5.619160]   with arguments:
[    5.619467]     /bin/sh
[    5.619713]   with environment:
[    5.619959]     HOME=/
[    5.620187]     TERM=linux
[    5.627298] Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
[    5.628017] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-gff903dea5e37 #6
[    5.628478] Call Trace:
[    5.628762] [<ffffffff8000502e>] dump_backtrace+0x1c/0x24
[    5.629415] [<ffffffff804b21c6>] show_stack+0x2c/0x38
[    5.630052] [<ffffffff804b9b1c>] dump_stack_lvl+0x3c/0x54
[    5.630696] [<ffffffff804b9b48>] dump_stack+0x14/0x1c
[    5.631265] [<ffffffff804b235e>] panic+0xf8/0x284
[    5.631833] [<ffffffff804ba194>] __irq_alloc_descs+0x0/0x1fa
[    5.632385] [<ffffffff80003234>] ret_from_exception+0x0/0x16
[    5.632991] ---[ end Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. ]---

I did a bit of reading on this error and found the following document discussed the issue quite well: https://docs.kernel.org/admin-guide/init.html

Any help would be much appreciated ^^

roryt12 commented 1 year ago

@OrkunAliOzkan error -2 means that the kernel can't find a init file to run in the root filesystem. And this is what it says in the Kernel panic, "No working init found. Try passing init= option to kernel". But from the information provided it is not clear how you load opensbi,kernel and initrd (ie the parameters for litex_term and "boot.json" ) , what are the boot parameters for the kernel in the DTS file, and what your initrd contains, so......

OrkunAliOzkan commented 1 year ago

@roryt12 The command I invoke when running the sim is:

litex_sim --with-ethernet --with-sdram --cpu-type rocket --cpu-variant --sdram-init boot.json --threads 8

To be honest, I am still unsure how to load the boot-ware dependencies into the simulator...I just left them in the same directory...

Also slightly unrelated but I realised I accidentally misaligned the start of my ramfs in the dts file to what I specified in boot.json, I am waiting for the simulator to finish booting atm. But still, to answer your question

The boot arguments within the dts file were the following:

chosen {
  bootargs = "console=liteuart earlycon=liteuart,0x12003000 rootwait root=/dev/ram0";
  linux,initrd-start = <0x81000000>;
  linux,initrd-end   = <0x81800000>;
};

and boot.json file is the following

{
    "initramfs.cpio":   "0x82000000",
    "Image":       "0x80200000",
    "fw_jump.bin": "0x80000000"
}

I have now changed initrd-start and initrd-end to be at 0x82000000 and 0x82800000 respectively. The outcome of this, thus far, has been that the initrd has not been specified as being disabled on my console output.

I followed the following discussion https://github.com/litex-hub/linux-on-litex-rocket/issues/29 to build my ramfs.

~~I also realise I don't think I provided my ramfs when building the kernel... ~~(https://stackoverflow.com/questions/65868294/if-i-build-linux-kernel-from-source-does-it-contain-initramfs-inside-by-default) I checked and in hindsight I believe I did, I put the ramfs into the linux kernel directory but I am unsure if there is a specific directory I have to place it in

roryt12 commented 1 year ago

If you loaded initramfs at 0x82000000 but you passed to the kernel that it is located at 0x81000000...........

OrkunAliOzkan commented 1 year ago

If you loaded initramfs at 0x82000000 but you passed to the kernel that it is located at 0x81000000...........

Yeah, it was a silly mistake xD I'll give an update later, sim isn't too quick

OrkunAliOzkan commented 1 year ago

The boot-ware has not yet failed however I did receive a worrying handle: Initramfs unpacking failed: invalid magic at start of compressed archive

OrkunAliOzkan commented 1 year ago

@OrkunAliOzkan error -2 means that the kernel can't find a init file to run in the root filesystem. And this is what it says in the Kernel panic, "No working init found. Try passing init= option to kernel". But from the information provided it is not clear how you load opensbi,kernel and initrd (ie the parameters for litex_term and "boot.json" ) , what are the boot parameters for the kernel in the DTS file, and what your initrd contains, so......

Hi again @roryt12, sadly I have still yet to resolve this issue.

To start, how exactly do I load all dependencies into the simulator? This has been a bit of a mystery for me not for lack of trying. I read https://github.com/litex-hub/linux-on-litex-vexriscv/blob/master/sim.py for inspiration, and as silly as it may sound, all I have done is place each respective dependencies into the same directory and invoked the simulator, I am unsure if one does boot through simulated hardware like SDRAM etc on the sim, or transmit through a simulated tftp transmission, if you may, please help clear any misconceptions I have here Though I do see the sim boot into opensbi so I take it somethinig is working ¯\(ツ)

Aswell, for good measure, please find my dts bellow, incase anything drastic points itself out:

/dts-v1/;

/ {
        #address-cells = <1>;
        #size-cells    = <1>;

        chosen {
            bootargs = "console=liteuart earlycon=liteuart,0x12003000 rootwait root=/dev/ram0";
            linux,initrd-start = <0x82000000>;
            linux,initrd-end   = <0x82800000>;
        };

        cpus {
            #address-cells = <1>;
            #size-cells    = <0>;
            timebase-frequency = <1000000>;

            CPU0: cpu@0 {
                device_type = "cpu";
                compatible = "riscv";
                riscv,isa = "rv64i2p0_mafdc";
                mmu-type = "riscv,sv39";
                reg = <0>;
                clock-frequency = <1000000>;
                status = "okay";

                d-cache-size = <4096>;
                d-cache-sets = <2>;
                d-cache-block-size = <64>;

                i-cache-size = <4096>;
                i-cache-sets = <2>;
                i-cache-block-size = <64>;

                d-tlb-size = <4>;
                d-tlb-sets = <1>;

                i-tlb-size = <4>;
                i-tlb-sets = <1>;

                L0: interrupt-controller {
                    #interrupt-cells = <0x00000001>;
                    interrupt-controller;
                    compatible = "riscv,cpu-intc";
                };
            };

        };

        memory@80000000 {
            device_type = "memory";
            reg = <0x80000000 0x4000000>;
        };

        clocks {
            sys_clk: litex_sys_clk {
                #clock-cells = <0>;
                compatible = "fixed-clock";
                clock-frequency = <1000000>;
            };
        };

        soc {
            #address-cells = <1>;
            #size-cells    = <1>;
            compatible = "simple-bus";
            interrupt-parent = <&intc0>;
            ranges;

            soc_ctrl0: soc_controller@12000000 {
                compatible = "litex,soc-controller";
                reg = <0x12000000 0xc>;
                status = "okay";
            };

            lintc0: clint@2000000 {
                compatible = "riscv,clint0";
                interrupts-extended = <&L0 3 &L0 7>;
                reg = <0x2000000 0x10000>;
                reg-names = "control";
            };

            intc0: interrupt-controller@c000000 {
                compatible = "sifive,fu540-c000-plic", "sifive,plic-1.0.0";
                reg = <0xc000000 0x400000>;
                #address-cells = <0>;
                #interrupt-cells = <1>;
                interrupt-controller;
                interrupts-extended = <
                    &L0 11 &L0 9>;
                riscv,ndev = <32>;
            };

            liteuart0: serial@12003000 {
                compatible = "litex,liteuart";
                reg = <0x12003000 0x100>;
                interrupts = <0>;
                status = "okay";
            };

            mac0: mac@12000800 {
                compatible = "litex,liteeth";
                reg = <0x12000800 0x7c>,
                      <0x12001000 0x0a>,
                      <0x30000000 0x2000>;
                reg-names = "mac", "mdio", "buffer";
                litex,rx-slots = <2>;
                litex,tx-slots = <2>;
                litex,slot-size = <2048>;
                interrupts = <2>;
                status = "okay";
            };

        };

        aliases {

                serial0 = &liteuart0;
        };
};

In regards to creating my initramfs, I create a bare bones root file system, please find bellow the script I invoke to create it:

#!/bin/bash
rm -rf initramfs
rm -rf initramfs.cpio
mkdir initramfs
pushd initramfs
mkdir -p bin sbin lib etc dev home proc sys tmp mnt nfs root \
          usr/bin usr/sbin usr/lib
sudo mknod -m 622 dev/console c 5 1
sudo mknod -m 622 dev/tty0 c 4 0
cp ../busybox_git/busybox bin/
ln -s bin/busybox ./init
cat > etc/inittab <<- "EOT"
::sysinit:/bin/busybox mount -t proc proc /proc
::sysinit:/bin/busybox mount -t devtmpfs devtmpfs /dev
::sysinit:/bin/busybox mount -t tmpfs tmpfs /tmp
::sysinit:/bin/busybox mount -t sysfs sysfs /sys
::sysinit:/bin/busybox --install -s
/dev/console::sysinit:-/bin/ash
EOT
fakeroot <<- "EOT"
find . | cpio -H newc -o > ../initramfs.cpio
EOT
popd

and I generate my busy-box executable through cross compiling with riscv64-linux-gnu-gcc toolchain, riscv64-unknown-elf.gcc-12.1.0 (https://github.com/stnolting/riscv-gcc-prebuilt/releases). The initramfs does get embedded into my kernel from my understanding, I setkernel parameter CONFIG_INITRAMFS_SOURCE to my initramfs.cpio file

roryt12 commented 1 year ago

@OrkunAliOzkan I'm just a simple enthusiastic user like you. If I had to guess, I would had focused on these lines from your original boot log:

[ 0.000000] INITRD: 0x81000000+0x00800000 overlaps in-use memory region [ 0.000000] - disabling initrd [ 0.000000] OF: reserved mem: OVERLAP DETECTED!

so I guess, you have a memory overlap issue, that cancels your initramfs ? In this case , maybe you need a reserved-memory block in your DTS file, eg:

reserved-memory {
   #address-cells = <1>;
   #size-cells    = <1>;
   ranges;
   opensbi@80000000 {
      reg = <0x80000000 0x200000>;
   };
};

But mind that I'm just guessing how it has to be done. I prefer to use the _litex_json2dtslinux script to create the initial DTS (and edit it after). As for the simulator parameters, sorry but I never used it. I have tried Litex only on FPGAs. Have a look in my steps , it may give you some hints.

OrkunAliOzkan commented 1 year ago

Hello roryt12,

Yesterday, I generated the initial dts and then modified it after to remove the reserved memory on purpose, this is because on prior runs it would cause memory overlap issues with opensbi and I worried that this potentially be causing issues. Find attached the console log output from before removal of the reserved-memory node: console log A, and console log from after: Console log B. A notable change in Console log B, was the following console output:

[    2.989615] Unpacking initramfs...
[    4.782309] Initramfs unpacking failed: invalid magic at start of compressed archive

While writing this message I read through issues https://github.com/litex-hub/linux-on-litex-vexriscv/issues/248 and believe this new output has to do with the fact that I am not specifying initrd-end as the actual size of the initramfs, which wouldn't be an issue if I provided opensbi as a single binary with all dependencies inside. For anyone in the future reading who isn't acquainted with OpenSBI, please read https://github.com/riscv-software-src/opensbi/blob/master/docs/firmware/fw.md. I am going to try both specifying the initrd-end parameter to be the size of initramfs, then reattempt fw_jump and then also just attempt fw_payload, I'll update how it goes.

ps: I had a read of your debian on litex docs little while back, eventually I do want to try boot debian and your doccumentation was quite useful for then B^) thanks

OrkunAliOzkan commented 1 year ago

Here to give the update...both failed. Even when I embedded everything into one big payload and eecuted it, both 'fw_jump' and 'fw_payload' gave error 2 (No working init found) again. Please find attached my two console logs, Console Log C (fw_payload) and Console Log D (fw_jump).

Unlike before, I never got any issues unpacking the "initrd" memory, and I after some reading it seems like reserved memory would not directly fix this problem, just prevent the any pesky overwrites. For now I will just set kernel boot parametr init to /init and enter debug mode in a hope for more relevant clues, and I'll make a rootfs with buildroot... I'll update on how it goes

Next week I will try to get my hands on a digilent_nexys_video fpga.

OrkunAliOzkan commented 1 year ago

Yeah, busybox wasn't statically linked...duh! Console Log E

Now my issue is quite similar to https://github.com/litex-hub/linux-on-litex-rocket/issues/29, with the exception that this user was using bbl rather than opensbi, and resolved the issue through using opensbi instead.

[    6.463240] Run /init as init process
[    6.463707]   with arguments:
[    6.463986]     /init
[    6.464259]   with environment:
[    6.464525]     HOME=/
[    6.464768]     TERM=linux

EDIT: I have been able to get the same ramfs to boot on spike, main difference is the fact that on spike I was running isa imafdc but on litex, I am running imac even though I specify in the dts/dtb that I want the isa imafdc and am using the cpu variant full

OrkunAliOzkan commented 1 year ago

DTS generation on Rocket SoCs are incorrect, interrupt label on uart node was wrong.

These are the bugs one will face when trying to host busybox with SoC configured same as me [node] -> [attribute] -> [old value] -> [new value] chosen -> initrd_start -> 0x81000000 -> 0x82000000 chosen -> initrd_end -> 0x81800000 -> 0x8220320E (this is relative, check size of your initrd!) reserved-mem -> THERE -> NOT sys_clk -> THERE -> NOT lintc0 -> interrupts-extended -> L4 -> L0 liteuart0 interrupts -> 0 -> 1
serial address 12004800 12009000 (or 12003000 for soc)