Open AlanVek opened 1 year ago
Hi,
There was a recent update related to the FPU, it maybe related.
Can you send me your opensbi / Linux / Buildroot images ? Also which version of the linux kernel / opensbi are you using ?
What git hash do you have for the pythondata-cpu-vexriscv_smp ? Can you try with https://github.com/litex-hub/pythondata-cpu-vexriscv_smp/commit/e8ce95bbff2742226e838a37a88e4153bd04178a ?
Thanks :D
hmm also, if you can send me your linux elf file that would be great :D
Keep in mind that I'm not using the standard memory map, this is my boot.json: { "Image": "0xB0000000", "rv32.dtb": "0xB0ef0000", "rootfs.cpio": "0xB1000000", "opensbi.bin": "0xB0f00000" }
With RAM being from 0xb0000000 to 0xd0000000.
The linux kernel version is the latest one that comes with cloning http://github.com/buildroot/buildroot, and I'm using master for pythondata-cpu-vexriscv_smp.
For the Linux elf file give me a little bit of time, because I need to regenerate it. In the meantime, I'll try with the commit you suggested.
Thanks for the quick response!
I still get the same errors with the suggested commit.
This is my .elf: linux.elf.zip
If it helps, this is my linux.config:
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# Architecture
CONFIG_ARCH_DEFCONFIG="arch/riscv/configs/defconfig"
CONFIG_NONPORTABLE=y
CONFIG_ARCH_RV32I=y
CONFIG_RISCV_ISA_M=y
CONFIG_RISCV_ISA_A=y
CONFIG_RISCV_ISA_C=y
CONFIG_SIFIVE_PLIC=y
CONFIG_FPU=y
CONFIG_SMP=y
CONFIG_STRICT_KERNEL_RWX=n
CONFIG_EFI=n
CONFIG_HVC_RISCV_SBI=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
# FPGA / SoC
CONFIG_FPGA=y
CONFIG_FPGA_MGR_LITEX=y
CONFIG_LITEX_SOC_CONTROLLER=y
CONFIG_LITEX_SUBREG_SIZE=4
# Time
CONFIG_PRINTK_TIME=y
# Clocking
CONFIG_COMMON_CLK=y
CONFIG_COMMON_CLK_LITEX=y
# Interrupts
CONFIG_IRQCHIP=y
CONFIG_OF_IRQ=y
CONFIG_HANDLE_DOMAIN_IRQ=y
CONFIG_LITEX_VEXRISCV_INTC=y
# Ethernet
CONFIG_NET=n
CONFIG_PACKET=n
CONFIG_PACKET_DIAG=n
CONFIG_INET=n
CONFIG_NETDEVICES=n
CONFIG_NET_VENDOR_LITEX=n
CONFIG_LITEX_LITEETH=n
# Serial
CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
CONFIG_SERIAL_LITEUART=y
CONFIG_SERIAL_LITEUART_CONSOLE=y
# GPIO
CONFIG_GPIO_SYSFS=y
CONFIG_GPIOLIB=y
CONFIG_GPIO_LITEX=y
# PWM
CONFIG_PWM=y
CONFIG_PWM_LITEX=y
# SPI
CONFIG_SPI=y
CONFIG_SPI_LITESPI=y
CONFIG_SPI_SPIDEV=y
# I2C
CONFIG_I2C=y
CONFIG_I2C_LITEX=y
CONFIG_I2C_CHARDEV=y
# Hardware monitoring
CONFIG_HWMON=y
CONFIG_SENSORS_LITEX_HWMON=y
# Framebuffer
CONFIG_FB=y
CONFIG_FB_SIMPLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_LOGO=y
CONFIG_DRM=y
CONFIG_DRM_LITEVIDEO=y
# Flash
CONFIG_MTD=y
CONFIG_MTD_SPI_NOR=y
CONFIG_SPI_FLASH_LITEX=y
# MMC
CONFIG_MMC=y
CONFIG_MMC_SPI=y
CONFIG_MMC_LITEX=y
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
CONFIG_EXT4_FS=y
# .config in kernel
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# Filesystem
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_MSDOS_PARTITION=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
CONFIG_NCPFS_SMALLDOS=y
CONFIG_NLS=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_TMPFS=y
CONFIG_HZ_100=y
CONFIG_RISCV_ISA_F=y
CONFIG_RISCV_ISA_D=y
And this is my litex_vexriscv_defconfig:
# Target options
BR2_riscv=y
BR2_RISCV_32=y
# Instruction Set Extensions
BR2_riscv_custom=y
BR2_RISCV_ISA_CUSTOM_RVM=y
BR2_RISCV_ISA_CUSTOM_RVA=y
BR2_RISCV_ISA_CUSTOM_RVC=y
BR2_RISCV_ISA_CUSTOM_RVF=y
BR2_RISCV_ISA_CUSTOM_RVD=y
BR2_RISCV_ABI_ILP32=y
# Patches
BR2_GLOBAL_PATCH_DIR="$(BR2_EXTERNAL_LITEX_VEXRISCV_PATH)/patches"
# GCC
BR2_GCC_VERSION_11_X=y
# System
BR2_TARGET_GENERIC_GETTY=y
BR2_TARGET_GENERIC_GETTY_PORT="console"
# Filesystem
BR2_TARGET_ROOTFS_CPIO=y
# Kernel (litex-rebase branch)
BR2_LINUX_KERNEL=y
BR2_LINUX_KERNEL_CUSTOM_GIT=y
BR2_LINUX_KERNEL_CUSTOM_REPO_URL="https://github.com/Dolu1990/litex-linux.git"
BR2_LINUX_KERNEL_CUSTOM_REPO_VERSION="ae80e67c6b48bbedcd13db753237a25b3dec8301"
BR2_LINUX_KERNEL_USE_CUSTOM_CONFIG=y
BR2_LINUX_KERNEL_CUSTOM_CONFIG_FILE="$(BR2_EXTERNAL_LITEX_VEXRISCV_PATH)/board/litex_vexriscv/linux.config"
BR2_LINUX_KERNEL_IMAGE=y
# Rootfs customisation
BR2_ROOTFS_OVERLAY="$(BR2_EXTERNAL_LITEX_VEXRISCV_PATH)/board/litex_vexriscv/rootfs_overlay"
BR2_GLOBAL_PATCH_DIR="$(BR2_EXTERNAL_LITEX_VEXRISCV_PATH)/patches"
#BR2_PACKAGE_HOST_LINUX_HEADERS_CUSTOM_5_18=y
# Extra packages
#BR2_PACKAGE_DHRYSTONE_OPT=y
#BR2_PACKAGE_MICROPYTHON=y
#BR2_PACKAGE_SPIDEV_TEST=y
#BR2_PACKAGE_MTD=y
#BR2_PACKAGE_MTD_JFFS_UTILS=y
# Crypto
#BR2_PACKAGE_LIBATOMIC_OPS_ARCH_SUPPORTS=y
#BR2_PACKAGE_LIBATOMIC_OPS=y
#BR2_PACKAGE_OPENSSL=y
#BR2_PACKAGE_LIBRESSL=y
#BR2_PACKAGE_LIBRESSL_BIN=y
#BR2_PACKAGE_HAVEGED=y
#BR2_PACKAGE_VEXRISCV_AES=y # Uncomment to enable hardware AES
BR2_RISCV_ABI_ILP32D=y
BR2_RISCV_ISA_CUSTOM_RVC=y
I still get the same errors with the suggested commit.
Hooo, hmmm I'm trying now to reproduce in simulation (not yet successfull having the bug)
Is the first bug :
[ 0.263653] futex hash table entries: 256 (order: , 16384 bytes, linear)
[ 0.295184] NET: Registered PF_NETLINK/PF_ROUT protocol family
[ 0.767033] ------------[ cut here ]------------
[ 0.767761] WARNING: CPU: X PID: 2 at kernel/rcu/tree.c:279 rcu_core+0x428/0x
[ 0.768925] CPU: 0 PID: p Comm: kthreaddNot tainted5.18.0-rc7 #5
[ 0.780034] epc : rcu_core+0x428/0x468
Appearing each time you run the system ? or sometime it changes ? It's to know how reproducible and localised the issue appear. To know how random things are.
Also, thanks for all the file / info ^^
It appears each time. The other ones sometimes change depending on the compilation flags (FPU=y/n, RVC=y/n).
So currently i have simulations running with linux-on-litex-vexriscv ./sim.py --cpu-count 1 --dcache-width 64 --icache-width 64 --dcache-ways 1 --icache-ways 1 --without-coherent-dma --dtlb-size 4 --itlb-size 4 --dcache-size 4096 --icache-size 4096 --with-fpu --with-wishbone-memory --with-rvc with your kernel image. Seems it is fine so far
One thing, about litex-hub/pythondata-cpu-vexriscv_smp@e8ce95b, You need to manualy delete all the VexRiscvLitexSmpCluster_xxxx.v in : https://github.com/litex-hub/pythondata-cpu-vexriscv_smp/tree/master/pythondata_cpu_vexriscv_smp/verilog in order to force their regeneration (sorry i forgot to tell about it)
Also, were you testing with a fresh Litex install ? (To have a idea of the setup ^^)
Can you send me your dts ? (device tree)
I'm generating the .v outside of linux-on-litex-vexriscv, so no problem there! Yes, I have a fresh litex install (maybe one day old).
Here is my dts: dts.zip
I also tried simulating the system with renode and everything worked fine and I was able to boot linux, so I think that the software stack is correctly generated.
Hmmm, so yeah, probably some hardware issue, the only thing i can think about to track dawn the issue / regression cause would be to test with litex-hub/pythondata-cpu-vexriscv_smp@e8ce95b (and deleting the VexRiscvLitexSmpCluster_xxxx.v in pythondata_cpu_vexriscv_smp/verilog). Thing is i don't have the any smartfusion 2 hardware, also, as it is a quite specific platform, it may be related to that. When you tried with litex-hub/pythondata-cpu-vexriscv_smp@e8ce95b, did you deleted the VexRiscvLitexSmpCluster_xxxx.v ? If not, can you try ? Thanks :D Simulations are still running on my side (no bug so far)
So i found something, your config use main memory mapped at > 0x80000000 Thing is that currently, VexRiscv is configured to consider everything > 0x80000000 as non cached memory region. So i will restart a simulation with uncached memory region only :)
Here is the hardcoded uncached memory range specification : https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/demo/smp/VexRiscvSmpLitexCluster.scala
Could be in your case : ioRange = address => address(31 downto 28) < 0xB ||address(31 downto 28) >= 0xD ,
Sorry, I forgot to mention that I have the following diff in pythondata-cpu-vexriscv_smp:
diff --git a/src/main/scala/vexriscv/demo/smp/VexRiscvSmpLitexCluster.scala b/src/main/scala/vexriscv/demo/smp/VexRiscvSmpLitexCluster.scala
index 3454577..a58aca4 100644
--- a/src/main/scala/vexriscv/demo/smp/VexRiscvSmpLitexCluster.scala
+++ b/src/main/scala/vexriscv/demo/smp/VexRiscvSmpLitexCluster.scala
@@ -151,8 +151,8 @@ object VexRiscvLitexSmpClusterCmdGen extends App {
cpuConfigs = List.tabulate(cpuCount) { hartId => {
val c = vexRiscvConfig(
hartId = hartId,
- ioRange = address => address.msb,
- resetVector = 0,
+ ioRange = address => (address(31 downto 28) === 0x3),//0x3),
+ resetVector = 0xA0000000l,//0xA0000000l,
iBusWidth = iBusWidth,
dBusWidth = dBusWidth,
iCacheSize = iCacheSize,
@@ -321,4 +321,4 @@ object VexRiscvLitexSmpClusterOpenSbi extends App{
}
}
}
-}
\ No newline at end of file
+}
I may have found a potential problem. I have the following example code:
int main(){
volatile uint32_t* v = (volatile uint32_t*)0x30012000;
while (1){
*v = *((volatile uint32_t*)0x30012004);
}
return 0;
}
What I'm seeing in my simulation is that, when the data_width is 32 bits, then this code translates to what it should: a 32-bit read to address 0x30012004, followed by a 32-bit write to address 0x30012000. But when data_width is 64 bits, then I see a 64-bit read to address 0x30012000, followed by a 32-bit write to address 0x30012000. Which means that if address 0x30012000 has a register that does something after reading it, this could be changing the expected behavior. Which would explain why when data_width is 32 bits, I'm seeing:
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 65024
[ 0.000000] Kernel command line: console=liteuart earlycon=liteuart,0x30012000 rootwait root=/dev/ram0
[ 9.982495] printk: console [liteuart0] enabled
[ 9.982495] printk: console [liteuart0] enabled
[ 9.990746] printk: bootconsole [liteuart0] disabled
[ 9.990746] printk: bootconsole [liteuart0] disabled
And when data_width is 64 bits, I'm seeing:
[ 0.000000] Built zonelists, mobility grouping o. Total pages: 65024
[ 0.0`000] Kernel command line: cons
[ 0.0`000] Unknown kernel command line parameters "con|", will be passed to user space.
[ 0.013150] printk: console [ty0] enabled
0 0.016269] printk: bootconsole [liteuart0] disabled
That would explain the weird behavior for the uart, and could also explain why the kernel fails to boot in case something like that is happening anywhere else.
Hoo nice :) Hmm, maybe adding : --wishbone-force-32b to the arguments list would help ?
The simulation with the basic example goes back to working as expected, I'm synthesizing now with fpu=True to check whether it works in hw. Is this 64-bit bus access "bug" a real bug or is it a feature? I just saw a read transaction that shouldn't be there, but there may be other stuff like that which eventually caused linux to fail at boot.
Thanks again for the help :)
so, i think it's mostly some missmatch between the Wishbone config from the SpinalHDL side and the litex side. So mostly, a configuration bug.
Probably we should just enfoce the wishbone to always be 32 bits no matter what.
Thanks too ^^
With this configuration, I can't even boot the bios.bin. And I think it's got something to do with the fact that I'm now seeing some transactions with SEL=0 in the simulation.
Like this:
Or like this:
Currently, my architecture is just ignoring those accesses and returning ACK immediately without forwarding that transaction to the actual memory, but I don't know if that's what the cpu expects and if that may be what's causing the problems.
I got it to work. But when I got to Linux, I got the same errors as in the beginning. Linux starts to boot, but then it gets a lot of errors and it ends with a kernel panic. Seems to be something specific of the FPU.
And I think it's got something to do with the fact that I'm now seeing some transactions with SEL=0 in the simulation.
Yes, this can happen when a SC (store conditional) is rejected.
Currently, my architecture is just ignoring those accesses and returning ACK immediately without forwarding that transaction to the actual memory, but I don't know if that's what the cpu expects and if that may be what's causing the problems.
That should be good.
So if i understand well, all the memory request are going to the wishbone in your case ? no litedram involved right ? That's something i didn't tried in simulation, i will try. to reproduce.
Yes, that's correct. Everything's going to the wishbone and I'm not using the litedram. Along with all the binaries, I could send you the verilog I'm using if it helps to reproduce the problem.
Sure that could help :)
Here is everything:
Hi, sorry for the delay :/
So i tried on my side on some hardware, and things seems to work fine on my tests. (linux 5.10 and 6.2 are ok with rv32imafdc) Which linux kernel version buildroot was using exactly ? (which git hash)
Hello, no worries.
The git hash of the buildroot repo is 61ba55e9cce6884295e47fdf33554e6877bd0747 and the git hash for the linux repo inside buildroot is ae80e67c6b48bbedcd13db753237a25b3dec8301.
Hi, I'm not successfuly reproducing the issue XD That's a paine XD
Do you have a way to run a simulation of you system until the crash happen ?
I successfully booted Linux with FPU disabled. When I enable FPU, I get the following errors:
After that:
After that, some more warnings/errors, and finally:
This is all using the same bios.bin/opensbi.bin/Image/rootfs.cpio/rv32.dtb that the one that works if the bitfile doesn't have FPU (compiled with abi=ilp32 arch=rv32i).
I get the same errors if I specifically compile Linux/Buildroot using the FPU/ISA_F/ISA_D flags, and using abi=ilp32d and arch=rv32imafdc.
The configuration I'm using is:
--cpu-count 1 --dcache-width 64 --icache-width 64 --dcache-ways 1 --icache-ways 1 --without-coherent-dma --dtlb-size 4 --itlb-size 4 --dcache-size 4096 --icache-size 4096 --with-fpu --with-wishbone-memory --with-rvc
The board is a SmartFusion2 running at 80MHz without timing failures.
Please let me know if I can provide any more information.