zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.54k stars 6.46k forks source link

Zephyr on Litex/Vexriscv not booting #44478

Closed lachlansmith closed 2 years ago

lachlansmith commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

I'm looking for help with Litex Vexriscv on an Arty A7 35T. I'm unable to boot zephyr no matter what I try and I'm wondering if I'm missing something fundamental. I submitted this as a bug because I can't see what I've missed.

After loading the BIOS onto the board any build with litex_vexriscv to create the zephyr.bin files leaves the terminal blank after liftoff. However, I able to load the bare metal demo.bin file successfully so I'm stumped as what I'm doing wrong.

My only idea is that litex_vexriscv DTS is failing with the SoC that is created from ./digilent_arty.py. So I attempted to to use https://github.com/litex-hub/zephyr-on-litex-vexriscv, but it is 18 months old and I have to delete and change things to get it to build and load so much so that I'm unsure the SoC would even match the litex_vexriscv DTS anymore.

To Reproduce Steps to reproduce the behavior: Here's everything I did from the beginning.

mkdir litex
cd litex
wget https://raw.githubusercontent.com/enjoy-digital/litex/master/litex_setup.py
chmod +x litex_setup.py
./litex_setup.py --init --install --user
pip3 install meson ninja
./litex_setup.py --gcc=riscv
mv riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14 riscv64-unknown-elf
export PATH="/home/lachy/litex/riscv64-unknown-elf/bin:$PATH"
sudo apt install libevent-dev libjson-c-dev verilator
cd litex-boards/litex_boards/targets/
./digilent_arty.py --toolchain vivado --cpu-type vexriscv --sys-clk-freq 80e6 --build
./digilent_arty.py --load
cd ~/zephyr/samples/subsys/shell/shell_module
west build -p auto -b litex_vexriscv .
litex_term /dev/ttyUSB1 --kernel build/zephyr/zephyr.bin --serial-boot

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
   Build your hardware, easily!

 (c) Copyright 2012-2022 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on Apr  2 2022 11:32:31
 BIOS CRC passed (8bdcbcdb)

 Migen git sha1: ac70301
 LiteX git sha1: c4004791

--=============== SoC ==================--
CPU:            VexRiscv @ 80MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            128KiB
SRAM:           8KiB
L2:             8KiB
SDRAM:          262144KiB 16-bit @ 640MT/s (CL-7 CWL-5)

--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |11110000000000000000000000000000| delays: 02+-02
  m0, b02: |00000011111111111111111100000000| delays: 15+-09
  m0, b03: |00000000000000000000000000011111| delays: 29+-03
  m0, b04: |00000000000000000000000000000000| delays: -
  m0, b05: |00000000000000000000000000000000| delays: -
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b02 delays: 15+-09
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |11110000000000000000000000000000| delays: 02+-02
  m1, b02: |00000011111111111111111100000000| delays: 15+-09
  m1, b03: |00000000000000000000000000011111| delays: 29+-02
  m1, b04: |00000000000000000000000000000000| delays: -
  m1, b05: |00000000000000000000000000000000| delays: -
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b02 delays: 15+-09
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB     
   Read: 0x40000000-0x40200000 2.0MiB     
Memtest OK
Memspeed at 0x40000000 (Sequential, 2.0MiB)...
  Write speed: 29.2MiB/s
   Read speed: 39.0MiB/s

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
[LITEX-TERM] Received firmware download request from the device.
[LITEX-TERM] Uploading build/zephyr/zephyr.bin to 0x40000000 (187436 bytes)...
[LITEX-TERM] Upload calibration... (inter-frame: 10.00us, length: 64)
[LITEX-TERM] Upload complete (9.9KB/s).
[LITEX-TERM] Booting the device.
[LITEX-TERM] Done.
Executing booted program at 0x40000000

--============= Liftoff! ===============--

Expected behavior A clear and concise description of what you expected to happen.

--============= Liftoff! ===============--

uart:~$
lachlansmith commented 2 years ago

Having come here for help I've figured out the above from other issues, thank you to this https://github.com/zephyrproject-rtos/zephyr/issues/42685. Not sure where the documentation is on this but. Perhaps https://docs.zephyrproject.org/2.6.0/boards/riscv/litex_vexriscv/doc/index.html could be updated to reflect.

mkdir litex
cd ~/litex
wget https://raw.githubusercontent.com/enjoy-digital/litex/master/litex_setup.py
chmod +x litex_setup.py
./litex_setup.py --init --install --user
pip3 install meson ninja
./litex_setup.py --gcc=riscv
mv riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14 riscv64-unknown-elf
export PATH="/home/lachy/litex/riscv64-unknown-elf/bin:$PATH"
sudo apt install libevent-dev libjson-c-dev verilator
cd litex-boards/litex_boards/targets/
./digilent_arty.py --build --csr-json csr.json
./digilent_arty.py --load
cp csr.json ~/zephyr/samples/subsys/shell/shell_module/
cd ~/zephyr/samples/subsys/shell/shell_module/
~/litex/litex/litex/tools/litex_json2dts_zephyr.py --dts litex_vexriscv.dts --config litex_vexriscv.config csr.json
west build -p auto -b litex_vexriscv . -DDTC_OVERLAY_FILE=litex_vexriscv.dts -DCONFIG_UART_LITEUART=y -DCONFIG_LITEX_TIMER=y -DCONFIG_ETH_LITEETH=n -DCONFIG_SPI_LITESPI=n -DCONFIG_I2C_LITEX=n
litex_term /dev/ttyUSB1 --kernel build/zephyr/zephyr.bin --serial-boot

Click RESET on Arty to program

dayjaby commented 2 years ago

Thanks for the list of instructions! That's very helpful

lachlansmith commented 2 years ago

Hi @dayjaby I'm getting Zephyr to boot however the kernel is deadlocking (not getting response from key presses, however LEDs on board are lighting up, which confirm the board is receiving the presses). Are the following issues related?

https://github.com/zephyrproject-rtos/zephyr/pull/39304 https://github.com/zephyrproject-rtos/zephyr/issues/39298

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
[LITEX-TERM] Received firmware download request from the device.
[LITEX-TERM] Uploading build/zephyr/zephyr.bin to 0x40000000 (182328 bytes)...
[LITEX-TERM] Upload calibration... (inter-frame: 10.00us, length: 64)
[LITEX-TERM] Upload complete (9.9KB/s).
[LITEX-TERM] Booting the device.
[LITEX-TERM] Done.
Executing booted program at 0x40000000

--============= Liftoff! ===============--

[00:00:00.000,000] <inf> CLK_CTRL_LITEX: CLKOUT0: set rate: 100000000 HZ
[00:00:00.000,000] <inf> CLK_CTRL_LITEX: CLKOUT1: updated rate: 100000000 to 100000000 HZ
[00:00:00.000,000] <inf> CLK_CTRL_LITEX: CLKOUT0: set duty: 50%
[00:00:00.000,000] <inf> CLK_CTRL_LITEX: CLKOUT0: set phase: 0 deg
[00:00:00.000,000] <inf> CLK_CTRL_LITEX: CLKOUT1: set rate: 100000000 HZ
[00:00:00.000,000] <inf> CLK_CTRL_LITEX: CLKOUT1: set duty: 50%
[00:00:00.000,000] <inf> CLK_CTRL_LITEX: CLKOUT1: set phase: 0 deg
[00:00:00.000,000] <inf> CLK_CTRL_LITEX: LiteX Clock Control driver initialized
uart:~$
dayjaby commented 2 years ago

For me it worked after applying the patch proposed in https://github.com/zephyrproject-rtos/zephyr/pull/39304:

uart:~$ help
Please press the <Tab> button to see all available commands.
You can also use the <Tab> button to prompt or auto-complete all commands or its subcommands.
You can try to call commands with <-h> or <--help> parameter for more information.

Shell supports following meta-keys:
  Ctrl + (a key from: abcdefklnpuw)
  Alt  + (a key from: bf)
Please refer to shell documentation for more details.

Available commands:
  clear               :Clear screen.
  date                :Date commands
  demo                :Demo commands
  device              :Device commands
  dynamic             :Demonstrate dynamic command usage.
  help                :Prints the help message.
  history             :Command history.
  hwinfo              :HWINFO commands
  i2c                 :I2C commands
  kernel              :Kernel commands
  log                 :Commands for controlling logger
  log_test            :Log test
  pwm                 :PWM shell commands
  resize              :Console gets terminal screen size or assumes default in
                       case the readout fails. It must be executed after each
                       terminal width change to ensure correct text display.
  shell               :Useful, not Unix-like shell commands.
  shell_uart_release  :Uninitialize shell instance and release uart, start
                       loopback on uart. Shell instance is renitialized when 'x'
                       is pressed
  version             :Show kernel version

It's very "laggy" though. Only reacting several seconds after typing something.

Edit: Trying to run gdb on this to see where this delay comes from:

./digilent_arty.py --toolchain symbiflow --cpu-type vexriscv --cpu-variant imac+debug --with-etherbone --sys-clk-freq 100e6 --csr-csv build/csr.csv --build
./digilent_arty.py --toolchain symbiflow --cpu-type vexriscv --cpu-variant imac+debug --with-etherbone --sys-clk-freq 100e6 --csr-csv build/csr.csv --load
wishbone-tool --ethernet-host 192.168.1.50 --csr-csv build/csr.csv -s gdb

I will let you know when I find something.

lachlansmith commented 2 years ago

Hi @dayjaby thanks for looking into this. Can you please expand on what you mean by "applying the patch". Am I manually changing something to affect the changes or merging something. I'm unsure and yet to get it working, some instruction would be great!

From what I can tell I need to change timer definitions in zephyr > drivers > timer > litex_timer.c to:

#define TIMER_RELOAD_ADDR   ((TIMER_BASE) + 0x04)
#define TIMER_EN_ADDR       ((TIMER_BASE) + 0x08)
#define TIMER_EV_PENDING_ADDR   ((TIMER_BASE) + 0x18)
#define TIMER_EV_ENABLE_ADDR    ((TIMER_BASE) + 0x1c)
#define TIMER_TOTAL_UPDATE  ((TIMER_BASE) + 0x0c)
#define TIMER_TOTAL     ((TIMER_BASE) + 0x10)
mckaymatthew commented 2 years ago

Hey. I'm the author of the proposed timer fix. Please note that the patch fixes the timer, but the other LiteX/Zephyr drivers are also most likely broken. @dayjaby have you set the CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC in your project configuration to your sysclk rate? Depending on what your clock rate is the tick rate of the kernel could be much too slow and end up causing a long delay in processing inputs.

Tracked in issues in the litex project: https://github.com/enjoy-digital/litex/issues/1264 https://github.com/enjoy-digital/litex/issues/1062

mckaymatthew commented 2 years ago

Hi @dayjaby thanks for looking into this. Can you please expand on what you mean by "applying the patch". Am I manually changing something to affect the changes or merging something. I'm unsure and yet to get it working, some instruction would be great!

From what I can tell I need to change timer definitions in zephyr > drivers > timer > litex_timer.c to:

#define TIMER_RELOAD_ADDR ((TIMER_BASE) + 0x04)
#define TIMER_EN_ADDR     ((TIMER_BASE) + 0x08)
#define TIMER_EV_PENDING_ADDR ((TIMER_BASE) + 0x18)
#define TIMER_EV_ENABLE_ADDR  ((TIMER_BASE) + 0x1c)
#define TIMER_TOTAL_UPDATE    ((TIMER_BASE) + 0x0c)
#define TIMER_TOTAL       ((TIMER_BASE) + 0x10)

Yes replacing those lines would allow your kernel to at least cycle. Please be aware that the other hardware drivers are also likely broken.

lachlansmith commented 2 years ago

Hi @mckaymatthew I've made those changes and now getting the same behaviour as @dayjaby. Given that I'm using the default --sys-clk-freq=100e6 I've set CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC=100000000.

lachlansmith commented 2 years ago

The above kernel deadlocking/latency issue is due to --csr-data-width default being 32, whereas for Zephyr and it drivers needs to be 8 (credit @mckaymatthew). However, I did not encounter any asserts preventing the build.

cd ~/litex/litex-boards/litex_boards/targets/
./digilent_arty.py --build --csr-data-width 8 --csr-json csr.json
./digilent_arty.py --load
mv csr.json ~/zephyr/samples/subsys/shell/shell_module/
cd ~/zephyr/samples/subsys/shell/shell_module/
~/litex/litex/litex/tools/litex_json2dts_zephyr.py --dts dts.overlay --config overlay.config csr.json
cat overlay.config | xargs west build -b litex_vexriscv . -DDTC_OVERLAY_FILE=dts.overlay
litex_term /dev/ttyUSB1 --kernel build/zephyr/zephyr.bin --serial-boot
michalsieron commented 2 years ago

Changed register offsets are not enough to make it work. You would also need to update timer init code:

diff --git a/drivers/timer/litex_timer.c b/drivers/timer/litex_timer.c
--- a/drivers/timer/litex_timer.c
+++ b/drivers/timer/litex_timer.c
@@ -81,12 +81,8 @@ static int sys_clock_driver_init(const struct device *dev)

        sys_write8(TIMER_DISABLE, TIMER_EN_ADDR);

-       for (int i = 0; i < 4; i++) {
-               sys_write8(k_ticks_to_cyc_floor32(1) >> (24 - i * 8),
-                               TIMER_RELOAD_ADDR + i * 0x4);
-               sys_write8(k_ticks_to_cyc_floor32(1) >> (24 - i * 8),
-                               TIMER_LOAD_ADDR + i * 0x4);
-       }
+       sys_write32(k_ticks_to_cyc_floor32(1), TIMER_RELOAD_ADDR);
+       sys_write32(k_ticks_to_cyc_floor32(1), TIMER_LOAD_ADDR);

        sys_write8(TIMER_ENABLE, TIMER_EN_ADDR);
        sys_write8(sys_read8(TIMER_EV_PENDING_ADDR), TIMER_EV_PENDING_ADDR);

That is because of different layout of data, when CSRs are 32-bit. Those two registers are 4 bytes each, so with 8-bit CSR each of them requires 4 subregisters. Each subregister is on address aligned to 4 bytes, that's why there was a loop writing one value spread over 4 subregisters.

With 32-bit CSRs, each of them requires only one subregister, so the loop was performing incorrect operations and timer was being incorrectly initialized.

Keep in mind that this is only temporary solution to make it work for CSRs configured with 32-bits, which in turn breaks 8-bit CSRs.

I opened two PRs (#45196 #45198), which solve this problem without limiting support for only one CSR data width.

fkokosinski commented 2 years ago

Closing this one as #45196 and #45198 which fix this issue had been merged.