openhwgroup / cva6-sdk

CVA6 SDK containing RISC-V tools and Buildroot
59 stars 62 forks source link

Linux Kernel Cannot Print where Bootrom and BBL can #39

Open Kendidi opened 3 years ago

Kendidi commented 3 years ago

Hi,

I tried to load and run Bootrom, BBL and Linux Kernel on FPGA.

Bootrom and BBL can print on UART console without any issue, but the Linux Kernel cannot, even it went pass through a few places in setup.c and init.c where it should have printed something on the console. I think the ariane.dts in bootrom should be OK, otherwise BBL may not be able to print.

I wonder: 1) What may be the issue? Where should I look at? 2) How does the UART/console get initialized normally so that functions like _prinfo( ) and printk( ) can print?

Thanks a lot in advance!

jctullos commented 3 years ago

@Kendidi

Did you ever find a fix? I'm at the same issue right now. BBL and Bootloader print just fine, and when it hands over to Linux, nothing gets printed. I figured it has to do with the UART, but still trying to work out how to fix it.

Kendidi commented 3 years ago

@jctullos

Not yet. I am still trying to figure it out. I initially thought early_init_dt_scan( ) takes care of UART initialization. But it appears in a Qemu environment, Linux Kernel can already print to console before early_init_dt_scan( ) is launched. Therefore I think the UART config was either done before MMU is enabled, or before Linux Kernel is launched.

jctullos commented 3 years ago

So if you pass earlycon through bootargs in defconfig, it will setup an early console with the uart. Are you running the Open Piton build or just regular CVA6/Ariane? What FPGA?

I'm running the Open Piton build, that uses ns16550, on a VCU118. I think the uart device isn't getting picked up correctly in the device tree. Right now I'm rebuilding the FPGA bitstream. My initial hunch is that the uart in the device tree passes "ns16550" as the compatible name, and maybe "ns16550a" is what the Linux driver is looking for, even though it should work fine with "ns16550". I'll let you know how it goes.

@zarubaf or @jbalkind Would either of you have seen this issue before?

Jbalkind commented 3 years ago

@jctullos please be sure you're building from the openpiton branch. Also, have you set earlycon=sbi?

jctullos commented 3 years ago

I didn't on this build, but I'll add it. I didn't see it on the openpiton branch for the linux defconfig, that might need to be added if it's required. I'll try it right now.

Jbalkind commented 3 years ago

At what point in the boot process are you not seeing output? With the existing defconfig built from the openpiton branch it should just print. Does the prebuilt bbl.bin work? https://www.princeton.edu/~cloud/openpiton/os_images/openpiton_ariane_linux_r12.tar.gz

jctullos commented 3 years ago

The prebuilt doesn't work either, I just tried again with the link you posted. Here's the output:

image

Jbalkind commented 3 years ago

Could you share the start of the output too? How did you prepare the SD?

jctullos commented 3 years ago

SD was prepared the way it shows on the openpiton repo. Here's the start:

image

jctullos commented 3 years ago

I am also working on another bbl build, it will print the device tree just fine, but hangs when Linux normally takes over. This is in minit.c and after which nothing outputs.

image

At the jump to payload. I had a debug statement there previously which was the last thing it printed before jumping to the payload.

Jbalkind commented 3 years ago

The output looks reasonable. Partition table is correct, bbl is clearly located in the right place, and system frequency is reported correctly which means the ariane.dts should also be correct. Since the partition table is correct, the SD reading clearly isn't totally broken... Timing was probably met too I'm going to guess?

Looks like you're building from openpiton-dev... Have you done a git submodule update --init --recursive piton/design/chip/tile/ariane by any chance?

I don't really trust bbl to be honest. It has a kernel memory corruption bug which has caused us some strife. You could try opensbi instead? I shared some instructions here previously: https://groups.google.com/g/openpiton/c/8J9IX1VLRHA/m/poid3PVbCgAJ

jctullos commented 3 years ago

I haven't done an update in a few days, I'll check it out.

Also, earlycon=sbi didn't change anything. And I'd like to move to opensbi, but my research is using Keystone and they still use bbl unfortunately.

I've seen this issue in the past, I just can't remember how I fixed it. I had issues with 64 bit memory addressing in the dts file getting parsed correctly. That would have been my next attempt, to change address and size cells to 1 instead of 2. But the openpiton build has high addressing, so I'm assuming that's out unless it can be easily changed.

Jbalkind commented 3 years ago

There shouldn't be any new ariane updates in terms of our submodule pointer, it was more in case you were going from the openpiton branch to openpiton-dev and hadn't got the accompanying changes in doing so.

That's a shame. I'll warn you that linux boot can be flaky with bbl. If you see random stalls and stack traces, it's likely to be bbl's fault. When we switched to opensbi, all of that went away. This was a problem both for standalone ariane and op+ariane. Others had reported the same for rocket I believe, too.

You can move the ariane peripherals to a different location by changing the base addresses in piton/design/xilinx/vcu118/devices_ariane.xml and see if that helps? But we were using bbl for quite a while and I don't remember seeing this issue. That does make me suspect that perhaps not all of the SD woes have passed. May be worth trying our sdctrl_test design on vcu118 and seeing if it passes?

Kendidi commented 3 years ago

@jctullos

I use Xilinx FPGA, bootrom.elf built off from the CVA6 fpga tree and bbl built from ariane-sdk.

Kendidi commented 3 years ago

@Jbalkind and @jctullos

I initially thought bootrom.elf (from cva6/fpga/) + bbl (from ariane-sdk) should be enough for booting Linux on the Xilinx FPFA we built. Is bbl not good enough? What approach would you recommend? Thanks.

Kendidi commented 3 years ago

@jctullos

Please let me know if you have any success or have found anything with your bbl build. Thanks.

Kendidi commented 3 years ago

@jctullos

So if you pass earlycon through bootargs in defconfig, it will setup an early console with the uart.

Can you please elaborate?

What I have in "\ariane-sdk\buildroot\output\build\linux-ariane-v0.7\arch\riscv\configs\defconfig":

So which routine is supposed to setup the console?

Moschn commented 3 years ago

Also, earlycon=sbi didn't change anything. And I'd like to move to opensbi, but my research is using Keystone and they still use bbl unfortunately.

@jctullos I have also noticed no uart output with keystone on stock Ariane (without openPiton). If I remember correctly I solved this with using a newer linux kernel version. I do not know where this comes from though. I would advise you to try a few different kernel versions (I believe I used 5.3).

Kendidi commented 3 years ago

@Moschn

Where in Ariane SDK we can select which Linux Kernel version to use?

Kendidi commented 3 years ago

According to https://github.com/pulp-platform/linux/, Linux 5.1-rc7 appears to be the latest.

Moschn commented 3 years ago

@Kendidi It is not supported to select this. You need to build it manually. I guess you could take a look at our build process and try to adapt it to an upstream kernel.

jctullos commented 3 years ago

thanks @Moschn although I'm using 5.3 right now for Keystone, so there must be something more.

jctullos commented 3 years ago

@Jbalkind

I think you might be right regarding the SD issues. After some random resets, Linux will boot. It's not consistent. So I assume data isn't getting copied over all the time. It hasn't completed a boot to the login prompt yet, right now it's failing on the RPC module, but at least there's output.

Kendidi commented 3 years ago

@jctullos

So what changes did you make so it can print now?

Kendidi commented 3 years ago

@Moschn

Ok. Thanks. After I downloaded Kernel 5.3, how should I modify before compiling it?

jctullos commented 3 years ago

@kendidi I thought I replied yesterday, sorry!

So I'm using a Linux 5.3 build. And passing console=hvc and earlycon=sbi to bootargs. FYI, Linux boots maybe 1/3rd of the time. And it always lags by a few seconds before it starts printing.

Kendidi commented 3 years ago

Thank you @jctullos!! I will try to find a suitable copy of Linux 5.3 and try it.

Kendidi commented 3 years ago

Besides "console=hvc" and "earlycon=sbi", what else do we need to put into bootargs?

jctullos commented 3 years ago

@Jbalkind

So I believe I've fixed the issue with Linux not showing the console. I remembered that I also had issues with the memory still holding data that wasn't getting cleared. I wrote a small while loop in the bootloader that just clears the memory where BBL is going, and then the sd_copy starts. So far, this has fixed it. There's no more issues between BBL and Linux handover.

One question though, I still have issues with the Linux boot stopping halfway. Looking at the vmlinux system map, it seems to be when clock source switches over. Earlier, I had changed the timebased-frequency value in the DTS to half of the clock frequency, this was only because the Ariane/CVA6 repo had a commit that fixed their DTS to half the system clock.

Does openpiton use another RTC build for the timebased-frequency? I know in the DTS generation script, it's the Sysclock/128. I'm changing it back to the default openpiton setting to see if this fixes Linux booting. But wanted to verify the frequency.

Kendidi commented 3 years ago

@jctullos

I am using Kernel v5.3. I have added the following to ariane.dts but it's still not printing once Kernel has taken over

bootargs = "console=hvc earlycon=sbi";

Is there anything else I have missed?

Jbalkind commented 3 years ago

That's strange. We have a hardware memory zeroer that runs before reset is released for the chip and rest of chipset. Is that not being included for vcu118 maybe?

You should be able to leave the timebase to be generated as it is in our piton/tools/bin/riscvlib.py. Provided you're running on something newer than I think release 13 (preferably openpiton-dev) you should have the fix for the earlier timebase trouble

jctullos commented 3 years ago

@Kendidi

I added those bootargs to the linux defconfig, not the dts.

And except for that, I'm not sure for your end. You could always try commenting out some of the options in the Linux defconfig to a bare system to try and see if something is preventing it.

Kendidi commented 3 years ago

@jctullos

Oh, I see. Do you recall which parameter to set in linux defconfig? Thanks.

I am kind of expecting function _param_setupearlycon( ) to be called somehow, but it is currently not.

jctullos commented 3 years ago

CONFIG_CMDLINE=

There's one in the defconfig for this repo I believe that you can add those args to.

Kendidi commented 3 years ago

Thanks @jctullos !

"earlyprintk" is already there.

jctullos commented 3 years ago

@Jbalkind Does openpiton use a different RTC than Ariane?

In ariane_xilinx.sv: // --------------- // CLINT // --------------- // divide clock by two always_ff @(posedge clk or negedge ndmreset_n) begin if (~ndmreset_n) begin rtc <= 0; end else begin rtc <= rtc ^ 1'b1; end end

So it divides the system clock by 2 for the RTC/timebase frequency.

Does openpiton over write that RTC with a new one? And where would it be located?

Thank you!

Jbalkind commented 3 years ago

I believe the one used in op+ariane is in system.v. The division is something like 2^7. The correct timebase value should be being used in the device tree already because I think we autogenerate from the frequency used in the block.list file. The file that generates the $ARIANE_ROOT/openpiton/bootrom/linux/ariane.dts is piton/tools/bin/riscvlib.py and it's called by code in the top of piton/design/chip/tile/rtl/tile.v.pyv

Does the invalid access LED turn on when the execution freezes by any chance?

jctullos commented 3 years ago

Ahh, yeah I see the RTC now in system.v. Thank you!

The invalid access LED doesn't turn on. It's extremely weird, but each time it stops around the clocksource switch. Maybe the uart frequency is somehow getting changed when it switches? Right now scratching my head on this one.

jctullos commented 3 years ago

@Kendidi Did you ever get yours working? What do you have so far?

Kendidi commented 3 years ago

@jctullos

Thank you for asking! Appreciate it.

Nope. Not much progress here. I am already using Kernel 5.3. I have modified the make file so it download and use kernel 5.3 from the public repository instead. I have tried setting console to ttyS0, ttyS1, hvc, hcv0, etc but still no luck. By the way, our 8GB DDR starts at 0x200000000. I have updated the files in Bootrom (including arine.dts) and BBL I know of to use 0x200000000 instead of the default address, 0x80000000.

And I am using ttyUSB1 on Linux host to read message sent from Bootrom and BBL from FPGA.

jctullos commented 3 years ago

@Kendidi

Just touching base, so I just tried a new github build ESP that uses Ariane. I just now was able to boot into Linux.

Here's their site: https://www.esp.cs.columbia.edu/docs/ that shows all the documentation and tutorials for getting it up and running.

Kendidi commented 3 years ago

@jctullos

Wow... Very cool! I will check it out. Thanks!

Kendidi commented 3 years ago

@jctullos

I load the bbl built from ESP and got similar failure I have been getting:

  1. No print out from Kernel after "bbl loader".
  2. Kernel stuck at "wfi" instruction in .

Hmm.. I think there may be a problem with the bootrom code we built.

jctullos commented 3 years ago

@Kendidi

Yeah, I'm trying to figure it out too. It was working fine without PMP, but when I enable PMP in the build and try to run, it stalls.

jctullos commented 3 years ago

@Kendidi

Alright, I have output from Linux now. For the ESP build, add these config options to the ariane defconfig and let me know if it works for you:

CONFIG_HVC_DRIVER=y CONFIG_SERIAL_EARLYCON_RISCV_SBI=y

CONFIG_CMDLINE="earlyprintk console=hvc earlycon=sbi"

There might have been one other, but I can't remember offhand. But now I have output from Linux again.

Kendidi commented 3 years ago

Alright, I have output from Linux now. For the ESP build, add these config options to the ariane defconfig and let me know if it works for you:

CONFIG_HVC_DRIVER=y CONFIG_SERIAL_EARLYCON_RISCV_SBI=y

CONFIG_CMDLINE="earlyprintk console=hvc earlycon=sbi"

There might have been one other, but I can't remember offhand. But now I have output from Linux again.

Thank you @jctullos !! I will try it out.

Kendidi commented 3 years ago

@jctullos

I've added those 3 config options to "../esp/soft/ariane/linux/arch/riscv/configs/ariane_defconfig". Re-run "make linux". Then used the resulted bbl. But got same result - no printout at Linux land, and stuck at "wfi" in .

Did I miss anything?

Jbalkind commented 3 years ago

Being stuck at wfi to me suggests that your CLINT/PLIC may not be working. Usually the core will sit there waiting to be woken up by some interrupt like a timer or IPI from the CLINT.

Are you using ariane-fpga? Is it up to date? I don't necessarily have much assistance to give in debugging it since I don't use Ariane's FPGA environment, but if you're definitely stuck at wfi I'd probably be looking into interactions with the CLINT.

Kendidi commented 3 years ago

@Jbalkind

Thank you for your advice! Yes I am using FPGA - Xilinx VCU128. It's pretty updated I think. Which version would you recommend? Should we used the latest in repository - "master" or the last release - Ariane 4.2?

Kendidi commented 3 years ago

By the way, since it's a single core setup, who's suppose to wake up Ariane?