openrisc / or1k_marocchino

OpenRISC processor IP core based on Tomasulo algorithm
Other
29 stars 11 forks source link

Booting Linux Failing #23

Closed stffrdhrn closed 2 years ago

stffrdhrn commented 2 years ago

I have setup or1k_marocchino to run on the digilent arty using Litex. See: https://github.com/enjoy-digital/litex/pull/1161

However, linux is not booting. So far I have traced this to go wrong during: init/main.c mm_init()

Output of: fusesoc run --target marocchino_tb mor1kx-generic --option_rf_num_shadow_gpr 1 --elf_load /home/shorne/work/linux/vmlinux --trace_enable

--
_raw_spin_lock_irqsave()
 itlb miss

S c05e2b20: e0729004 l.or    r3,r18,r18      r3         = c0800020  flag: 0
S c05e2b20: e0729004 l.or    r3,r18,r18      r3         = c0800020  flag: 0
S 00000a00: 000005e0 l.j     0x00005e0                              flag: 0
S 00000a04: 15000000 l.nop   0x0000                                 flag: 0
S 00002180: c0001422 l.mtspr r0,r2,0x0422    SPR[0422]  = c06b7ee0  flag: 0
S 00002184: c0001c23 l.mtspr r0,r3,0x0423    SPR[0423]  = c0800020  flag: 0
S 00002188: c0002424 l.mtspr r0,r4,0x0424    SPR[0424]  = 00000900  flag: 0
--
kmem_cache_alloc()

S c0119c54: 8702fffc l.lwz   r24,0xfffc(r2)  r24        = c0753984  flag: 1
S c0119c54: 8702fffc l.lwz   r24,0xfffc(r2)  r24        = c0753984  flag: 1
S 00000a00: 000005e0 l.j     0x00005e0                              flag: 1
S 00000a04: 15000000 l.nop   0x0000                                 flag: 1
S 00002180: c0001422 l.mtspr r0,r2,0x0422    SPR[0422]  = c06b7f2c  flag: 1
S 00002184: c0001c23 l.mtspr r0,r3,0x0423    SPR[0423]  = c0802000  flag: 1
S 00002188: c0002424 l.mtspr r0,r4,0x0424    SPR[0424]  = 00000100  flag: 1
--
strlen (c0683a23)

S c05c8644: e1631804 l.or    r11,r3,r3       r11        = c0683a23  flag: 1
S c05c8648: 922b0000 l.lbs   r17,0x0000(r11) r17        = 00000008  flag: 1
S 00000900: 000005c0 l.j     0x00005c0                              flag: 1
S 00000904: 15000000 l.nop   0x0000                                 flag: 1
S 00002000: c0774610 l.mtspr r23,r8,0x1e10   SPR[3f30]  = 00000008  flag: 1
S 00002004: 00002000 l.j     0x0002000                              flag: 1
S 00002008: 00000005 l.j     0x0000005                              flag: 1
--
STRLEN
S 00002254: 24000000 l.rfe                                          flag: 0
S c05c8648: 922b0000 l.lbs   r17,0x0000(r11) r17        = 00000008  flag: 1
S 00000900: 000005c0 l.j     0x00005c0                              flag: 1
S 00000904: 15000000 l.nop   0x0000                                 flag: 1
S 00002000: c0774610 l.mtspr r23,r8,0x1e10   SPR[3f30]  = 00000008  flag: 1
S 00002004: 00002000 l.j     0x0002000                              flag: 1
S 00002008: 00000005 l.j     0x0000005                              flag: 1
stffrdhrn commented 2 years ago

Hi @bandvig , I am trying to boot linux or the marocchino but running into issues. From the trace above somewhere it looks like the MMU is going wrong. I tried to run some of the MMU tests from or1k-tests but I found no issues. Since it fails during booting linux and its running a lot of code it takes a long time to reproduce using the icarus simulator. I will see if it's possible to capture a VCD capture where its going wrong (though it may end up being too big or too slow).

If you have any ideas let me know.

bandvig commented 2 years ago

@stffrdhrn Yes, MMU debugging is very hard. I spend a lot of time to fix their behavior. For my Atlys board I use old version of U-Boot (as I known, U-Bott doesn't support OR1K now days). And I run a Linux image from memory.

The only idea I have just now is that you build a SoC with same clock for MAROCCHINO core (port cpu_clk) and Wishbone clock (port wb_clk). In my SoCs cpu_clk is typically greater than wb_clk. One the one hand I don't see a way it could be the reason. On the other hand I've never tested my SoCs for cpu_clk == wb_clk case. Perhaps, an intensive CPU <-> RAM exchange could lead to incorrect operation of MAROCCHINO quasi clock domain crossing logic.

I hope, I'll check the hypothesis soon.

stffrdhrn commented 2 years ago

I am running cpu_clk == wb_clk. I was able to get a VCD capture with the first clues. When the DTLB miss happens linux jumps to 0x900 then 0x2000. The instructions I am seeing returned by fetch after jumping to 0x2000 do not match the expected instructions at all. I will have to trace this more. I have a 300mb VCD capture.

Trace showing 900 to 2000, which gets bad instructions and jumps to a random place

linux-boot-bad-fetch-2022-01-17 07-06-44

Expected instructions for dtlb_miss_handler

c0002000 <dtlb_miss_handler>:
c0002000:       c0 00 14 22     l.mtspr r0,r2,0x422
c0002004:       c0 00 1c 23     l.mtspr r0,r3,0x423
c0002008:       c0 00 24 24     l.mtspr r0,r4,0x424
c000200c:       b4 40 00 30     l.mfspr r2,r0,0x30
c0002010:       18 60 c0 77     l.movhi r3,0xc077
c0002014:       a8 63 60 b4     l.ori r3,r3,0x60b4
c0002018:       b4 80 00 80     l.mfspr r4,r0,0x80
c000201c:       b8 84 00 02     l.slli r4,r4,0x2
c0002020:       e0 63 20 00     l.add r3,r3,r4
c0002024:       18 80 40 00     l.movhi r4,0x4000
c0002028:       e0 84 18 00     l.add r4,r4,r3
c000202c:       84 64 00 00     l.lwz r3,0(r4)
c0002030:       b8 82 00 58     l.srli r4,r2,0x18
c0002034:       b8 84 00 02     l.slli r4,r4,0x2
c0002038:       e0 64 18 00     l.add r3,r4,r3
bandvig commented 2 years ago

@stffrdhrn Very impressive! How long the simulation was (and in which system)? Unfortunately, now days I haven't much time to pay to MAROCCHINO. I've downloaded freshest kernel release (5.16.1) and run menu-based configuration for OpenRISC. I see in Processor type and features the use SPR_SR_DSX software emulation is raised by default. But I always switch it OFF because SR[DSX] is implemented in MAROCCHINO. Which is value you use here?

stffrdhrn commented 2 years ago

I am just using icarus with fusesoc. My system is a 11th Gen Intel(R) Core(TM) i5-11500 @ 2.70GHz. It takes less than an hour to get to the failure point.

Run with

fusesoc run --target marocchino_tb mor1kx-generic --option_rf_num_shadow_gpr 1 --elf_load /home/shorne/work/linux/vmlinux --trace_enable  --vcd --testcase linux-boot

Kernel compiled

Download kernel from: https://www.kernel.org/ (I use git, but you could use the tarball too) Toolchain: you can use the glibc toolchain.

I recommend just starting with defconfig as per below.

make ARCH=openrisc or1ksim_defconfig
make -j12 LC_ALL=en_US.UTF-8 ARCH=openrisc \
    CFLAGS="-O3 -mhard-mul" \
    CROSS_COMPILE=or1k-linux-

I have left the DSX emulation on. The mor1kx should implement SR[DSX] too but there seems to be a problem either with how linux reads it or how mor1kx implements it.

stffrdhrn commented 2 years ago

@bandvig , I am trying to trace the fetch mmu/icache. Could you remind me what s1o, s2t etc mean?

As I see, fetch has 3 stages. so I guess the first part is for the stage s1, s2, s3, and:

bandvig commented 2 years ago

@stffrdhrn s1, s2 and so on prefixes mean stage 1, 2, etc o means output latch t means temporary or transient i.e. not yet latched.

I would ask you to try switch DSX simulation off.

stffrdhrn commented 2 years ago

Thanks, I will try with DSX off when I get back to the computer.

stffrdhrn commented 2 years ago

Unfortunately the DSX flag doesn't seem to help. Also, mor1kx can boot with use SPR_SR_DSX software emulation disabled.

stffrdhrn commented 2 years ago

I was able to trace a bit more in the vcd. I could see that during the ICACHE refill that we were storing 00000078 to 00002020. That is consistent with the bad instructions that were executed below.

It could be that the copy of linux to memory is corrupt, or somehow the ibus transactions get corrupt, which could point to the clock issue you mentioned earlier. I will continue to trace until I get to the details.

Instructions at address 0x2000

c0002000 <dtlb_miss_handler>:
c0002000:       c0 00 14 22     l.mtspr r0,r2,0x422
c0002004:       c0 00 1c 23     l.mtspr r0,r3,0x423
c0002008:       c0 00 24 24     l.mtspr r0,r4,0x424
c000200c:       b4 40 00 30     l.mfspr r2,r0,0x30
c0002010:       18 60 c0 77     l.movhi r3,0xc077
c0002014:       a8 63 60 b4     l.ori r3,r3,0x60b4
c0002018:       b4 80 00 80     l.mfspr r4,r0,0x80
c000201c:       b8 84 00 02     l.slli r4,r4,0x2
c0002020:       e0 63 20 00     l.add r3,r3,r4
c0002024:       18 80 40 00     l.movhi r4,0x4000
c0002028:       e0 84 18 00     l.add r4,r4,r3
c000202c:       84 64 00 00     l.lwz r3,0(r4)
c0002030:       b8 82 00 58     l.srli r4,r2,0x18
c0002034:       b8 84 00 02     l.slli r4,r4,0x2
c0002038:       e0 64 18 00     l.add r3,r4,r3

Traced instructions at address 0x2000

Lot's of jump instructions.

S 00002000: c0774610 l.mtspr r23,r8,0x1e10   SPR[3f30]  = 00000008  flag: 1
S 00002004: 00002000 l.j     0x0002000                              flag: 1
S 00002008: 00000005 l.j     0x0000005                              flag: 1
S 0000201c: 00000044 l.j     0x0000044                              flag: 1
S 00002020: 00000078 l.j     0x0000078                              flag: 1
S 00002200: e0832003 l.and   r4,r3,r4        r4         = 00000000  flag: 1
stffrdhrn commented 2 years ago

Bad Data Refill

The bad instructions are coming from the ICACHE , as we can see here (FETCH_ICACHE ibus_dat_i == c0774610). The incorrect instructions are getting refilled into the ICACHE.

marocchin-bad-ram-refill-2022-01-19 05-23-52

Earlier Good Data Refill

However, earlier during execution the same address was refilled with the good instructions, here (FETCH_ICACHE ibus_dat_i == c0001422):

marocchino-refill-good-earlier-2022-01-19 05-26-19

Bad Data write

I traced writes to the 0x2000 address by looking at ram writes. We can see the instructions at 0x2000 are being overwritten by bad data (ram din == c0774610). This is happening at a few different times during startup (which is not correct).

Its not easy to trace yet as the instructions doing this is just part of memcpy.

c0004920 <memcpy>:
...
c0004a60:       87 33 00 00     l.lwz r25,0(r19)
c0004a64:       d4 11 c8 00     l.sw 0(r17),r25
c0004a68:       87 33 00 04     l.lwz r25,4(r19)
c0004a6c:       d4 11 c8 04     l.sw 4(r17),r25
c0004a70:       87 33 00 08     l.lwz r25,8(r19)
c0004a74:       d4 11 c8 08     l.sw 8(r17),r25
c0004a78:       87 33 00 0c     l.lwz r25,12(r19)
c0004a7c:       d4 11 c8 0c     l.sw 12(r17),r25
c0004a80:       87 33 00 10     l.lwz r25,16(r19)
c0004a84:       d4 11 c8 10     l.sw 16(r17),r25
c0004a88:       87 33 00 14     l.lwz r25,20(r19)
c0004a8c:       d4 11 c8 14     l.sw 20(r17),r25

marocchino-bad-write-2022-01-19 05-37-19

Bootstrap (bad write)

Following the trace back a bit before memcopy we can see that memcpy is called as part of bootstrap:

c06d9604:       9c 21 ff ec     l.addi r1,r1,-20
c06d9608:       d4 01 a0 08     l.sw 8(r1),r20
c06d960c:       d4 01 10 0c     l.sw 12(r1),r2
c06d9610:       1a 80 c0 72     l.movhi r20,0xc072
c06d9614:       9c 41 00 14     l.addi r2,r1,20
c06d9618:       a8 80 09 00     l.ori r4,r0,0x900
c06d961c:       d4 01 48 10     l.sw 16(r1),r9
c06d9620:       d4 01 80 00     l.sw 0(r1),r16
c06d9624:       d4 01 90 04     l.sw 4(r1),r18
c06d9628:       e2 43 18 04     l.or r18,r3,r3
c06d962c:       07 e8 ff 79     l.jal c0119410 <kmem_cache_alloc>
c06d9630:       84 74 bf a0     l.lwz r3,-16480(r20)
c06d9634:       86 34 bf a0     l.lwz r17,-16480(r20)
c06d9638:       e0 92 90 04     l.or r4,r18,r18
c06d963c:       e0 6b 58 04     l.or r3,r11,r11
c06d9640:       e2 0b 58 04     l.or r16,r11,r11
c06d9644:       07 e4 ac b7     l.jal c0004920 <memcpy>   <--- call to membopy
c06d9648:       84 b1 00 10     l.lwz r5,16(r17)

Which is this c code from mm/slub.c:

static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache)
{
        int node;
        struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
        struct kmem_cache_node *n;

        memcpy(s, static_cache, kmem_cache->object_size);

But this is not the first bad write to 0x2000. Its better to look at that.

First Bad Data write (to 0x2000)

This happens in memset

c00048f4:       13 ff ff fd     l.bf c00048e8 <memset+0x78>
c00048f8:       9e 73 00 04     l.addi r19,r19,4
c00048fc:       bc 11 00 00     l.sfeqi r17,0
c0004900:       10 00 00 06     l.bf c0004918 <memset+0xa8>
c0004904:       9e 31 ff ff     l.addi r17,r17,-1
c0004908:       d8 13 68 00     l.sb 0(r19),r13

marocchino-bad-write-1-2022-01-19 06-11-49

However I notice something

Finding

The LSU address is 0x802000, but the address that gets to RAM is 0x02000. Some how the address bits above 8MB are getting truncated. It seems this is a much more simple problem then expected.

marcocchino-bad-write-2-2022-01-19 06-15-04

stffrdhrn commented 2 years ago

This may just be a bug with my mor1kx-generic simulation SOC. But it doesn't explain why litex boot is failing, after fixing mor1kx-generic I will try again.

bandvig commented 2 years ago

@stffrdhrn Unfortunately I could not build a complete operating Linux from relatively modern tool-chains and kernel.

1) GNU lib C based toolchain (GCC-11.2.0 release, kernel-5.9.0-rc2) I built successfully as I mentioned in closed issue (https://github.com/openrisc/or1k-glibc/issues/11). With the toolchain I built busybox and whole linux. Kernel starts normally. Init passes, but traditional Please press Enter to activate this console. does not appears. If I press Enter only empty new line appears, but does not happen anymore.

2) I also tried to built uClibc-based toolchain (GCC-11.2.0 release, kernel-5.9.0-rc2). I used uClibc-ng v.1.0.39. I could not build more fresh uClibc-ng because of an error reported by GCC-11.2.0 while compiling resolv.c. In fact the commit caused the error was added to uClibc-ng repo exactly after v.1.0.39 release. But even with 1.0.39 release I had to remove c++ support from toolchain and built C-only toolchain because of enormous errors appeared while building libstdc++ (from gcc sources). Well, I built C-only uClibc-ng v.1.0.39 based toolchain at last. I used it to build busybox and linux. Kernel starts normally. Init passes and traditional Please press Enter to activate this console. messages appear. As I press Enter, shell prompt appears. But if I try to send any command, ls for example, it reports -/bin/sh: can't set tty process group: No such process and returns to prompt.

3) At last I found an old Linux image I built many years ago from Jonas' sources with an old compiler (4.9.0 if I remember correctly). It starts and operates normally.

4) I also build SoC with cpu_clk equals to wb_clk. And Jonas' Linux 9mentioned above) works properly on the SoC too. And more modern linux images works with same issues (described above in (1) and (2)).

Even so I have different problems (with busybox ?) on modern toolchain and linux builds, but at my Alys board kernel boots up always at least.

stffrdhrn commented 2 years ago

After fixing the simulatorit is able to boot linux (I didn't wait until completion, but it gets much further than the fpga).

I will have to try to use litescope or something similar to see where things are going wrong on the litex soc. As you pointed out it may be due to having wb_clk == cpu_clk.

Compiled-in FDT at (ptrval)
Linux version 5.16.0-rc3-00001-g397b730a3d13 (shorne@antec) (or1k-linux-gcc (GCC) 11.0.1 20210416 (experimental), GNU ld (GNU Binutils) 2.36.50.20210506) #559 SMP Wed Jan 19 21:24:09 JST 2022
CPU: OpenRISC-10 (revision 0) @20 MHz
-- dcache: 16384 bytes total, 32 bytes/line, 2 way(s)
-- icache: 16384 bytes total, 32 bytes/line, 2 way(s)
-- dmmu:   64 entries, 1 way(s)
-- immu:   64 entries, 1 way(s)
-- additional features:
-- debug unit
-- PIC
-- timer
Initial ramdisk not found
Setting up paging and PTEs.
map_ram: Memory: 0x0-0x2000000
Zone ranges:
  Normal   [mem 0x0000000000000000-0x0000000001ffffff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000000000-0x0000000001ffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x0000000001ffffff]
itlb_miss_handler (ptrval)
dtlb_miss_handler (ptrval)
OpenRISC Linux -- http://openrisc.io
percpu: Embedded 4 pages/cpu s11456 r0 d21312 u32768
Built 1 zonelists, mobility grouping off.  Total pages: 4080
Kernel command line: earlycon
earlycon: ns16550a0 at MMIO 0x90000000 (options '115200')   
printk: bootconsole [ns16550a0] enabled
Dentry cache hash table entries: 4096 (order: 1, 16384 bytes, linear)
Inode-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
Sorting __ex_table...
mem auto-init: stack:off, heap alloc:off, heap free:off
mem_init_done ...........................................   
Memory: 27384K/32768K available (4467K kernel code, 157K rwdata, 264K rodata, 179K init, 87K bss, 5384K reserved, 0K cma-reserved)
(stopped it here)
stffrdhrn commented 2 years ago

Thanks @bandvig for the report. I would say linux boots on your system in all cases. Which is a good information point to have. So it seems there is something strange going on with litex SoC's.

One difference with Litex is that it uses SPR_EVBAR. As the kernel is not loaded at offset 0x0, but somewhere else, EVBAR is used to adjust the CPU to run off a different base address.

It does look like marocchino implements EVBAR just fine too.

litex/soc/software/bios/boot.c

        /* Mainline Linux expects to have exception vector base address set to the
         * base address of Linux kernel; it also expects to be run with an offset
         * of 0x100. */
        mtspr(SPR_EVBAR, addr);
        addr += 0x100;
stffrdhrn commented 2 years ago

I found a few more issues in Litex. It is mostly booting now.

[    0.000000] FDT at (ptrval)
[    0.000000] Linux version 5.16.0-rc3-smh-00001-g397b730a3d13 (shorne@antec) (or1k-linux-gcc (GCC) 11.0.1 20210416 (experimental), GNU ld (GNU Binutils) 2.36.50.20210506) #561 
SMP Thu Jan 20 20:07:26 JST 2022
[    0.000000] CPU: OpenRISC-10 (revision 0) @100 MHz
[    0.000000] -- dcache: 4096 bytes total, 16 bytes/line, 1 way(s)
[    0.000000] -- icache: 4096 bytes total, 16 bytes/line, 1 way(s)
[    0.000000] -- dmmu:   64 entries, 1 way(s)
[    0.000000] -- immu:   64 entries, 1 way(s)
[    0.000000] -- additional features:
[    0.000000] -- PIC
[    0.000000] -- timer
[    0.000000] Initial ramdisk not found
[    0.000000] Setting up paging and PTEs.
[    0.000000] map_ram: Memory: 0x0-0x10000000
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] itlb_miss_handler (ptrval)
[    0.000000] dtlb_miss_handler (ptrval)
[    0.000000] OpenRISC Linux -- http://openrisc.io
[    0.000000] percpu: Embedded 4 pages/cpu s9712 r0 d23056 u32768
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 32640
[    0.000000] Kernel command line: console=liteuart earlycon=liteuart,0xe0006800 rootwait root=/dev/mmcblk0p3
[    0.000000] earlycon: liteuart0 at I/O port 0x0 (options '')
[    0.000000] Malformed early option 'console'
[    0.000000] earlycon: liteuart0 at MMIO 0xe0006800 (options '')
[    0.000000] printk: bootconsole [liteuart0] enabled
[    0.000000] Dentry cache hash table entries: 32768 (order: 4, 131072 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 3, 65536 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] mem_init_done ...........................................
[    0.000000] Memory: 253408K/262144K available (5612K kernel code, 539K rwdata, 768K rodata, 193K init, 212K bss, 8736K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=2 to nr_cpu_ids=1.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 32, nr_irqs: 32, preallocated irqs: 0
[    0.000000] clocksource: openrisc_timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[    0.000000] Console: colour dummy device 80x25
[    0.000000] 200.00 BogoMIPS (lpj=1000000)
[    0.010000] pid_max: default: 32768 minimum: 301
[    0.020000] Mount-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    0.020000] Mountpoint-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    0.090000] rcu: Hierarchical SRCU implementation.
[    0.110000] smp: Bringing up secondary CPUs ...
[    0.120000] smp: Brought up 1 node, 1 CPU
[    0.130000] devtmpfs: initialized
[    0.180000] random: get_random_u32 called from bucket_table_alloc.isra.0+0x164/0x184 with crng_init=0
[    0.200000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.210000] futex hash table entries: 256 (order: -1, 4096 bytes, linear)
[    0.240000] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.700000] pps_core: LinuxPPS API ver. 1 registered
[    0.700000] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.710000] PTP clock support registered
[    0.750000] clocksource: Switched to clocksource openrisc_timer
[    1.170000] NET: Registered PF_INET protocol family
[    1.180000] IP idents hash table entries: 4096 (order: 2, 32768 bytes, linear)
[    1.240000] tcp_listen_portaddr_hash hash table entries: 1024 (order: 0, 12288 bytes, linear)
[    1.250000] TCP established hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    1.250000] TCP bind hash table entries: 2048 (order: 1, 16384 bytes, linear)
[    1.270000] TCP: Hash tables configured (established 2048 bind 2048)
[    1.280000] UDP hash table entries: 256 (order: 0, 8192 bytes, linear)
[    1.290000] UDP-Lite hash table entries: 256 (order: 0, 8192 bytes, linear)
[    1.300000] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    1.340000] RPC: Registered named UNIX socket transport module.
[    1.350000] RPC: Registered udp transport module.
[    1.350000] RPC: Registered tcp transport module.
[    1.360000] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    1.410000] workingset: timestamp_bits=30 max_order=15 bucket_order=0
[    2.120000] NFS: Registering the id_resolver key type
[    2.130000] Key type id_resolver registered
[    2.130000] Key type id_legacy registered
[    2.160000] io scheduler mq-deadline registered
[    2.160000] io scheduler kyber registered
[    2.180000] LiteX SoC Controller driver initialized
[    4.770000] e0006800.serial: ttyLXU0 at MMIO 0x0 (irq = 0, base_baud = 0) is a liteuart
[    4.780000] printk: console [liteuart0] enabled
[    4.780000] printk: console [liteuart0] enabled
[    4.790000] printk: bootconsole [liteuart0] disabled
[    4.790000] printk: bootconsole [liteuart0] disabled
[    4.870000] libphy: Fixed MDIO Bus: probed
[    4.910000] liteeth e0001000.mac eth0: irq 2 slots: tx 2 rx 2 size 2048
[    4.950000] litex-mmc e0005000.mmc: can't get voltage, defaulting to 3.3V
[    5.000000] litex-mmc e0005000.mmc: LiteX MMC controller initialized.
[    5.020000] NET: Registered PF_PACKET protocol family
[    5.030000] Key type dns_resolver registered
[    5.060000] Waiting for root device /dev/mmcblk0p3...
[    5.170000] mmc0: new SDHC card at address aaaa
[    5.200000] mmcblk0: mmc0:aaaa SL08G 7.40 GiB
[    5.250000]  mmcblk0: p1 p2 p3 p4
[    5.530000] EXT4-fs (mmcblk0p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: disabled.
[    5.540000] VFS: Mounted root (ext4 filesystem) readonly on device 179:3.
[    5.560000] devtmpfs: mounted
[    5.570000] Freeing unused kernel image (initmem) memory: 192K
[    5.580000] This architecture does not have kernel memory protection.
[    5.590000] Run /sbin/init as init process
[    6.640000] random: fast init done
INIT: version  booting
INIT: No inittab.d directory found

It stops after this, when going into user space which is similar to what you were seeing now.

stffrdhrn commented 2 years ago

For the record this is what the mor1kx boot process looks like:

[    0.000000] FDT at (ptrval)
[    0.000000] Linux version 5.16.0-rc3-smh-00001-g397b730a3d13 (shorne@antec) (or1k-linux-gcc (GCC) 11.0.1 20210416 (experimental), GNU ld (GNU Binutils) 2.36.50.20210506) #561 
SMP Thu Jan 20 20:07:26 JST 2022
[    0.000000] CPU: OpenRISC-10 (revision 0) @100 MHz
[    0.000000] -- dcache: 4096 bytes total, 16 bytes/line, 1 way(s)
[    0.000000] -- icache: 4096 bytes total, 16 bytes/line, 1 way(s)
[    0.000000] -- dmmu:   64 entries, 1 way(s)
[    0.000000] -- immu:   64 entries, 1 way(s)
[    0.000000] -- additional features:
[    0.000000] -- PIC
[    0.000000] -- timer
[    0.000000] Initial ramdisk not found
[    0.000000] Setting up paging and PTEs.
[    0.000000] map_ram: Memory: 0x0-0x10000000
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] itlb_miss_handler (ptrval)
[    0.000000] dtlb_miss_handler (ptrval)
[    0.000000] OpenRISC Linux -- http://openrisc.io
[    0.000000] percpu: Embedded 4 pages/cpu s9712 r0 d23056 u32768
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 32640
[    0.000000] Kernel command line: console=liteuart earlycon=liteuart,0xe0006800 rootwait root=/dev/mmcblk0p3
[    0.000000] earlycon: liteuart0 at I/O port 0x0 (options '')
[    0.000000] Malformed early option 'console'
[    0.000000] earlycon: liteuart0 at MMIO 0xe0006800 (options '')
[    0.000000] printk: bootconsole [liteuart0] enabled
[    0.000000] Dentry cache hash table entries: 32768 (order: 4, 131072 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 3, 65536 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] mem_init_done ...........................................
[    0.000000] Memory: 253408K/262144K available (5612K kernel code, 539K rwdata, 768K rodata, 193K init, 212K bss, 8736K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=2 to nr_cpu_ids=1.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 32, nr_irqs: 32, preallocated irqs: 0
[    0.000000] clocksource: openrisc_timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[    0.000000] Console: colour dummy device 80x25
[    0.000000] 200.00 BogoMIPS (lpj=1000000)
[    0.010000] pid_max: default: 32768 minimum: 301
[    0.020000] Mount-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    0.020000] Mountpoint-cache hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    0.080000] rcu: Hierarchical SRCU implementation.
[    0.090000] smp: Bringing up secondary CPUs ...
[    0.100000] smp: Brought up 1 node, 1 CPU
[    0.110000] devtmpfs: initialized
[    0.150000] random: get_random_u32 called from bucket_table_alloc.isra.0+0x164/0x184 with crng_init=0
[    0.160000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.170000] futex hash table entries: 256 (order: -1, 4096 bytes, linear)
[    0.190000] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.550000] pps_core: LinuxPPS API ver. 1 registered
[    0.550000] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.560000] PTP clock support registered
[    0.590000] clocksource: Switched to clocksource openrisc_timer
[    0.910000] NET: Registered PF_INET protocol family
[    0.920000] IP idents hash table entries: 4096 (order: 2, 32768 bytes, linear)
[    0.950000] tcp_listen_portaddr_hash hash table entries: 1024 (order: 0, 12288 bytes, linear)
[    0.960000] TCP established hash table entries: 2048 (order: 0, 8192 bytes, linear)
[    0.970000] TCP bind hash table entries: 2048 (order: 1, 16384 bytes, linear)
[    0.980000] TCP: Hash tables configured (established 2048 bind 2048)
[    0.990000] UDP hash table entries: 256 (order: 0, 8192 bytes, linear)
[    1.000000] UDP-Lite hash table entries: 256 (order: 0, 8192 bytes, linear)
[    1.010000] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    1.040000] RPC: Registered named UNIX socket transport module.
[    1.040000] RPC: Registered udp transport module.
[    1.050000] RPC: Registered tcp transport module.
[    1.050000] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    1.100000] workingset: timestamp_bits=30 max_order=15 bucket_order=0
[    1.640000] NFS: Registering the id_resolver key type
[    1.650000] Key type id_resolver registered
[    1.650000] Key type id_legacy registered
[    1.670000] io scheduler mq-deadline registered
[    1.670000] io scheduler kyber registered
[    1.690000] LiteX SoC Controller driver initialized
[    3.680000] e0006800.serial: ttyLXU0 at MMIO 0x0 (irq = 0, base_baud = 0) is a liteuart
[    3.700000] printk: console [liteuart0] enabled
[    3.700000] printk: console [liteuart0] enabled
[    3.700000] printk: bootconsole [liteuart0] disabled
[    3.700000] printk: bootconsole [liteuart0] disabled
[    3.760000] libphy: Fixed MDIO Bus: probed
[    3.790000] liteeth e0001000.mac eth0: irq 2 slots: tx 2 rx 2 size 2048
[    3.820000] litex-mmc e0005000.mmc: can't get voltage, defaulting to 3.3V
[    3.870000] litex-mmc e0005000.mmc: LiteX MMC controller initialized.
[    3.880000] NET: Registered PF_PACKET protocol family
[    3.890000] Key type dns_resolver registered
[    3.920000] Waiting for root device /dev/mmcblk0p3...
[    4.040000] mmc0: new SDHC card at address aaaa
[    4.060000] mmcblk0: mmc0:aaaa SL08G 7.40 GiB 
[    4.110000]  mmcblk0: p1 p2 p3 p4
[    4.310000] EXT4-fs (mmcblk0p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: disabled.
[    4.320000] VFS: Mounted root (ext4 filesystem) readonly on device 179:3.
[    4.340000] devtmpfs: mounted
[    4.350000] Freeing unused kernel image (initmem) memory: 192K
[    4.350000] This architecture does not have kernel memory protection.
[    4.360000] Run /sbin/init as init process
[    5.180000] random: fast init done
INIT: version  booting
INIT: No inittab.d directory found
[    6.100000] EXT4-fs (mmcblk0p3): re-mounted. Opts: (null). Quota mode: disabled.
[    7.610000] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[    7.730000] EXT4-fs (mmcblk0p4): recovery complete
[    7.740000] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: (null). Quota mode: disabled.
[    7.940000] Adding 1953784k swap on /dev/mmcblk0p2.  Priority:-2 extents:1 across:1953784k SS
INIT: Entering runlevel: 3
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Initializing random number generator: OK
Saving random seed: [   15.310000] random: dd: uninitialized urandom read (512 bytes read)
OK
Starting network: OK
Starting sntp: sntp 4.2.8p15@1.3728-o Thu Oct  7 21:02:53 UTC 2021 (1)
1970-01-01 00:00:20.948508 (+0000) +1642677605.740277 +/- 1095118403.864738 pool.ntp.org 162.159.200.123 s3 no-leap
OK
mounting home work nfs ...
enabling login for shorne ...
setting up sshkeys ...
setting coredumps ...
[   28.130000] random: crng init done
Starting sshd: OK

Welcome to Linux on LiteX-mor1kx
buildroot login: root
                __   _
               / /  (_)__  __ ____ __
              / /__/ / _ \/ // /\ \ /
             /____/_/_//_/\_,_//_\_\
                   / _ \/ _ \
   __   _ __      _\___/_//_/              ___ __
  / /  (_) /____ | |/_/____ _ _  __   _ ,_/  // /___  __
 / /__/ / __/ -_)>  </___/ ) ) )/ _ \/ V_// //   /\ \/ /
/____/_/\__/\__/_/|_|   /_/_/_/ \___/_/  /_//_/\_\/_/\_\

 32-bit OpenRISC CPU with MMU integrated in a LiteX SoC
# echo hello
hello
# 
bandvig commented 2 years ago

@stffrdhrn As my English is not perfect, could you explain in more details your note It stops after this, when going into user space which is similar to what you were seeing now.? Does in mean that booting stops after INIT: No inittab.d directory found? Does it hang up completely or some how react on something like pressing Enter for example?

stffrdhrn commented 2 years ago

Sorry, it is not clear. I mean INIT: No inittab.d directory found is the last line printed. After that there is no response. Also, the system does not respond to keyboard input.

Does in mean that booting stops after INIT: No inittab.d directory found?

Yes, that is the last line printed before the system hangs.

Does it hang up completely or some how react on something like pressing Enter for example?

The machine does not responed to any keyboard inputs including Enter.

bandvig commented 2 years ago

@stffrdhrn On my Atlys SoC with 5.9.0-rc2 kernel behavior differs. /sbin/init executes /etc/init.d/rcS but could not exit from it stayed inside rcS endless. I can send a string, say ttt and then get a response like /etc/init.d/rcS: line 58: ttt: not found.

Any case, I thought a little and I think it could point to three places, were we should pay attention.

  1. In MAROCCHINO it is implicitly assumed that a branch instruction must not be placed in delay slot. I remember we discussed that earlier and you implemented it in GCC and also modified appropriate test cases. It should not be a problem now. But just for remaining.

  2. It is implicitly assumed that while Linux supervisor programs MMUs the MMUs are off. It is especially does matter for IMMU because FETCH could continue fetching while appropriate l.mptspr achieves DECODE. As l.mtspr achieved decode, IFETCH become stalled till l.mtspr completion. As l.mtspr completes IFETCH's stages resume exactly from states they were stalled. I imagine the following IMMU programming sequence:

I suppose that in the realty Linux supervisor managers MMUs in some another way. It works some how if I use uClibc-based toolchain (MMUs are enabled in both uClibc an busybox), but could fail with GNU Libc based toolchain (due to more intensive manipulations with MMUs, perhaps?).

Could you guide me how Linux manages MMUs?

  1. You invoke EVBAR. I've never use it explicitly. Does your kernel build use it? Even so EVBAR behavior should be inherited from mor1kx, perhaps we should check it once more because I've never tested it.
stffrdhrn commented 2 years ago

@bandvig I was able to do a bit more debugging by adding some print statements into the exception handlers in linux and I can confirm some things.

From a dump of the syscalls init is doing. 291 is statx. I also observed it calling __NR_pselect6_time64 and __NR_clock_gettime64.

[   95.860000] syscall r9:0x3013239c -> syscall(291) return 0
[   95.860000] CPU #: 0
[   95.860000]    PC: 301323a8    SR: 0000807e    SP: 7ff15590
[   95.870000] GPR00: 00000000 GPR01: 7ff15590 GPR02: 0800e240 GPR03: ffffff9c
[   95.880000] GPR04: 0800a096 GPR05: 00000800 GPR06: 000007ff GPR07: 7ff155dc
[   95.880000] GPR08: 00989680 GPR09: 3013239c GPR10: 3002f8e0 GPR11: 00000000
[   95.890000] GPR12: 301322bc GPR13: 300ff5d0 GPR14: 0800e240 GPR15: 0000827e
[   95.900000] GPR16: 301cac24 GPR17: 00000000 GPR18: 7ff15740 GPR19: 00000000
[   95.910000] GPR20: 0800e240 GPR21: 08012460 GPR22: 0800a096 GPR23: 00000000
[   95.910000] GPR24: 7ff157c0 GPR25: 00000744 GPR26: 3002d474 GPR27: fffffff9
[   95.920000] GPR28: ffffffff GPR29: 00008079 GPR30: 00000000 GPR31: c053fbf9
[   95.930000]   RES: 00000000 oGPR11: 00000123
[   95.930000] Process init (pid: 1, stackpage=c0844000)
[   95.940000] 
[   95.940000] syscall r9:0x3013239c -> syscall(291) return 0
[   95.940000] CPU #: 0
[   95.940000]    PC: 301323a8    SR: 0000807e    SP: 7ff15590
[   95.950000] GPR00: 00000000 GPR01: 7ff15590 GPR02: 0800e240 GPR03: ffffff9c
[   95.960000] GPR04: 0800a096 GPR05: 00000800 GPR06: 000007ff GPR07: 7ff155dc
[   95.960000] GPR08: 00989680 GPR09: 3013239c GPR10: 3002f8e0 GPR11: 00000000
[   95.970000] GPR12: 301322bc GPR13: 30133b80 GPR14: 0800e240 GPR15: 0000807e
[   95.980000] GPR16: 301cac24 GPR17: 00c4bbff GPR18: 7ff15740 GPR19: 00c4bbff
[   95.990000] GPR20: 0800e240 GPR21: 00000000 GPR22: 0800a096 GPR23: 00000000
[   95.990000] GPR24: 7ff157c0 GPR25: 7ff158c0 GPR26: 3002d474 GPR27: fffffff9
[   96.000000] GPR28: ffffffff GPR29: 00008079 GPR30: 00000000 GPR31: c053fbf9
[   96.010000]   RES: 00000000 oGPR11: 00000123
[   96.010000] Process init (pid: 34, stackpage=c0a7d240)
[   96.020000] 

The init version I am using from buildroot is is sysvinit 2.99, the source code is here:

https://git.savannah.nongnu.org/cgit/sysvinit.git/tree/src/init.c?h=2.99#n2925

I was able to trace this, I can see the init process has started and it has openned a pipe it is waiting for something to tell is that the kernel is done booting. I am not sure how it all works but I think it's waiting for the root filesystem to be remounted.

init_main()
  check_init_fifo()

...

  /* Wait for data to appear, _if_ the pipe was opened. */
  if (pipe_fd >= 0) { 
     while(!quit) {

    /* Do select, return on EINTR. */
    FD_ZERO(&fds);
    FD_SET(pipe_fd, &fds);
    tv.tv_sec = 5;         <--- wait 5 seconds, then try again
    tv.tv_usec = 0;
    n = select(pipe_fd + 1, &fds, NULL, NULL, &tv);
stffrdhrn commented 2 years ago

@bandvig

In terms of your points I think the MMU operation of marocchino sounds correct.

Note: glibc does not touch MMU settings, this is all up to the kernel. glibc is for running user mode code.

Linux exception handlers work like this:

The point above is there is only a very short time when the MMU and CACHEs are disabled when transitioning to kernel mode.

I think what you described about marocchino is similar to mor1kx it sounds correct to me.

stffrdhrn commented 2 years ago

Note, if I switch out /sbin/init for /bin/sh I can get a shell. It crashes quickly though.

stffrdhrn commented 2 years ago

@bandvig I was thinking about what you mentioned about caching. Is it possible to easily disabled caches on Marocchino to see if we can narrow down the issue to caches? On linux we set the parameter OPTION_*CACHE_LIMIT_WIDTH to 31 to disable caching any addresses greater than 0x8000000. IO addresses are in this range and this means no IO access is cached.

bandvig commented 2 years ago

@stffrdhrn I see two ways to disable caches.

  1. On software side: prevent writing into SR[DCE] and SR[IME]. At reset they initialized with zeros. It could be quite annoying because of many points exists (in bios, u-boot, Linux, etc) were such writing is performed.
  2. On hardware side: edit lines https://github.com/openrisc/or1k_marocchino/blob/79ce290612a0d410023998f258a1755abbea56bc/rtl/verilog/or1k_marocchino_ctrl.v#L761 and https://github.com/openrisc/or1k_marocchino/blob/79ce290612a0d410023998f258a1755abbea56bc/rtl/verilog/or1k_marocchino_ctrl.v#L762 forcing ic_enable_o and immu_enable_o to 1'b0.
stffrdhrn commented 2 years ago

@bandvig good point,

I was able to disable it by removing the ICE and DCE flags in linux. This did disable the caches as I can tell booting is now running very slow. But the behavior is the same, nothing happens after /sbin/init is kicked. I could also try to disable via the hardware to really be sure:

[    9.230000] Waiting for root device /dev/mmcblk0p3...
[    9.320000] mmc0: new SDHC card at address aaaa
[    9.380000] mmcblk0: mmc0:aaaa SL08G 7.40 GiB
[    9.490000]  mmcblk0: p1 p2 p3 p4
[    9.860000] EXT4-fs (mmcblk0p3): INFO: recovery required on readonly filesystem
[    9.870000] EXT4-fs (mmcblk0p3): write access will be enabled during recovery
[   11.080000] EXT4-fs (mmcblk0p3): recovery complete
[   11.110000] EXT4-fs (mmcblk0p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: disabled.
[   11.130000] VFS: Mounted root (ext4 filesystem) readonly on device 179:3.
[   11.160000] devtmpfs: mounted
[   11.190000] Freeing unused kernel image (initmem) memory: 184K
[   11.200000] This architecture does not have kernel memory protection.
[   11.210000] Run /sbin/init as init process
[   11.700000] random: fast init done
INIT: version  booting
INIT: No inittab.d directory found
stffrdhrn commented 2 years ago

@bandvig I was able to get linux 5.16 to boot by disabling the sd card and booting from an in memory rootfs which runs an old build of busybox.

It seems there is something wrong with marocchino interacting with the litex/linux SD-Card driver in particular. Or it could be when using a glibc runtime. I have something working now so I can start to narrow it down.

/ # uname -a
Linux openrisc 5.16.0-rc3-smh-00003-gd9c9b8f7fce7-dirty #608 Mon Jan 24 22:06:42 JST 2022 openrisc GNU/Linux
/ # cat /proc/cpuinfo 
cpu architecture        : OpenRISC 1000 (1.1-rev0)
cpu implementation id   : 0x2
cpu version             : 0x10004
frequency               : 100000000
dcache size             : 4096 bytes
dcache block size       : 16 bytes
dcache ways             : 1
icache size             : 4096 bytes
icache block size       : 16 bytes
icache ways             : 1
immu                    : 64 entries, 1 ways
dmmu                    : 64 entries, 1 ways
bogomips                : 200.00
features                : orbis32  orfpx32 orfpx64  

/ # uptime
 09:47:39 up  9:47,  0 users,  load average: 0.00, 0.00, 0.00

/ # /sbin/init --help
BusyBox v1.26.0.git (2016-11-07 21:05:52 JST) multi-call binary.

Usage: init 

Init is the first process started during boot. It never exits.
It (re)spawns children according to /etc/inittab.
/ # mount
rootfs on / type rootfs (rw,size=126192k,nr_inodes=15774)
proc on /proc type proc (rw,relatime)
devtmpfs on /dev type devtmpfs (rw,relatime,size=126192k,nr_inodes=15774,mode=755)
sysfs on /sys type sysfs (rw,relatime)
none on /dev/pts type devpts (rw,relatime,mode=600,ptmxmode=000)
none on /dev/shm type tmpfs (rw,relatime)
stffrdhrn commented 2 years ago

No, the sd-card works just fine under marocchino too.

It has something to do with glibc or the toolchain.

Running a musl binary

Works ok

/ # ./strace /lib/ld-musl-or1k.so.1 --help
execve("/lib/ld-musl-or1k.so.1", ["/lib/ld-musl-or1k.so.1", "--help"], 0x7fe09ea4 /* 7 vars */) = 0
set_tid_address(0x300c93b4)             = 164
writev(2, [{iov_base="musl libc (or1k)\nVersion 1.1.19\n"..., iov_len=62}, {iov_base="/lib/ld-musl-or1k.so.1", iov_len=22}], 2musl libc (or1k)
Version 1.1.19
Dynamic Program Loader
Usage: /lib/ld-musl-or1k.so.1) = 84
writev(2, [{iov_base=" [options] [--] pathname [args]\n", iov_len=32}, {iov_base=NULL, iov_len=0}], 2 [options] [--] pathname [args]
) = 32
exit_group(1)                           = ?
+++ exited with 1 +++

Running a glibc binary

If fails at address 7d8 with a SEGV

/ # ./strace /mnt/root/lib/ld-linux-or1k.so.1  --help
execve("/mnt/root/lib/ld-linux-or1k.so.1", ["/mnt/root/lib/ld-linux-or1k.so.1", "--help"], 0x7fdfbe94 /* 7 vars */) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xd00007d8} ---
+++ killed by SIGSEGV +++
SEGV
stffrdhrn commented 2 years ago

@bandvig I think I am getting closer to figuring out the issue here. It appears that mmap pages that restrict the UWE or SWE bits are not causing page faults and allowing writes to unwritable pages. This mechanism is used by linux for Copy-on-Write.

I will try to build a smaller program that can reproduce this.

Debugging

I was able to debug this by looking at some core dumps, but it was kind of tricky because the binary file was changing on disk, which I didn't realize until the end.

Below shows what happens, during startup the ld.so program.

First read the dynamic section

Lets look at this section before we start running. It will become clear later why we look here. I can get the offset by looking at readelf but I don't have readelf on my busybox, so use hexdump.

Note we see a dynamic pointer at offset 0002bf48, is as follows:

we see: 6ffffef5 000001d0, 6ffffef5 means it's the _DT_GNU_HASH pointer

/ # hexdump -C /mnt/lib/ld-linux-or1k.so.1 | grep 0002bf
00002bf0  04 00 2d 98 d7 e2 8e 18  e2 30 70 00 86 71 00 3c  |..-......0p..q.<|
0002bf00  4f 43 5f 43 48 45 43 4b  5f 00 00 00 00 00 00 00  |OC_CHECK_.......|
0002bf10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0002bf30  00 00 00 00 00 00 00 00  00 00 00 0e 00 00 02 0b  |................|
0002bf40  00 00 00 04 00 00 00 f4 *6f ff fe f5 00 00 01 d0* |........o.......|
0002bf50  00 00 00 05 00 00 04 fc  00 00 00 06 00 00 02 bc  |................|
0002bf60  00 00 00 0a 00 00 02 37  00 00 00 0b 00 00 00 10  |.......7........|
0002bf70  00 00 00 03 00 02 e0 00  00 00 00 02 00 00 00 30  |...............0|
0002bf80  00 00 00 14 00 00 00 07  00 00 00 17 00 00 08 a4  |................|
0002bf90  00 00 00 07 00 00 07 d8  00 00 00 08 00 00 00 cc  |................|
0002bfa0  00 00 00 09 00 00 00 0c  6f ff ff fc 00 00 07 7c  |........o......||
0002bfb0  6f ff ff fd 00 00 00 03  00 00 00 18 00 00 00 00  |o...............|
0002bfc0  6f ff ff fb 00 00 00 01  6f ff ff f0 00 00 07 34  |o.......o......4|
0002bfd0  6f ff ff f9 00 00 00 0f  00 00 00 00 00 00 00 00  |o...............|
0002bfe0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

For the first run, it works

It works because the binary is as expected.

/ # /mnt/lib/ld-linux-or1k.so.1
/mnt/lib/ld-linux-or1k.so.1: missing program name
Try '/mnt/lib/ld-linux-or1k.so.1 --help' for more information.

Next read the dynamic section again...

data has changed,

look at address 0002bf48:
    before                        after
    6ffffef5 000001d0 ----> 6ffffef5 300001d0
A + `0x30000000` has been applied to the file!

that shouldn't happen
somehow the file memory and program runtime memory is getting
combined.
/ # hexdump -C /mnt/lib/ld-linux-or1k.so.1 | grep 0002bf
00002bf0  04 00 2d 98 d7 e2 8e 18  e2 30 70 00 86 71 00 3c  |..-......0p..q.<|
0002bf00  4f 43 5f 43 48 45 43 4b  5f 00 00 00 00 00 00 00  |OC_CHECK_.......|
0002bf10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0002bf30  00 00 00 00 00 00 00 00  00 00 00 0e 00 00 02 0b  |................|
0002bf40  00 00 00 04 30 00 00 f4 *6f ff fe f5 30 00 01 d0 *|....0...o...0...|
0002bf50  00 00 00 05 30 00 04 fc  00 00 00 06 30 00 02 bc  |....0.......0...|
0002bf60  00 00 00 0a 00 00 02 37  00 00 00 0b 00 00 00 10  |.......7........|
0002bf70  00 00 00 03 30 02 e0 00  00 00 00 02 00 00 00 30  |....0..........0|
0002bf80  00 00 00 14 00 00 00 07  00 00 00 17 30 00 08 a4  |............0...|
0002bf90  00 00 00 07 30 00 07 d8  00 00 00 08 00 00 00 cc  |....0...........|
0002bfa0  00 00 00 09 00 00 00 0c  6f ff ff fc 00 00 07 7c  |........o......||
0002bfb0  6f ff ff fd 00 00 00 03  00 00 00 18 00 00 00 00  |o...............|
0002bfc0  6f ff ff fb 00 00 00 01  6f ff ff f0 30 00 07 34  |o.......o...0..4|
0002bfd0  6f ff ff f9 00 00 00 0f  00 00 00 00 00 00 00 00  |o...............|
0002bfe0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Run again, we crash!

Notice the details below like found: and dump: I added as extra debug details into the kernel page fault handler. We can see the memory mapping and we see the program is loaded into address 0x30000000.

Also we can see the prot (protection) flags are 551. This translates to write not being allowed:

#define _PAGE_CC       0x001 /* software: pte contains a translation */
#define _PAGE_A        0x010 /* accessed               */
#define _PAGE_URE      0x040 /* user read enable       */
#define _PAGE_SRE      0x100 /* superuser read enable  */
#define _PAGE_EXEC     0x400 /* software: page is executable */
/ # /mnt/lib/ld-linux-or1k.so.1 
[  197.280000] fault cant access stack below usp
[  197.280000] found: 7f870000 -> 7f892000 - flags: 100173 prot: 551
[  197.290000] dump: 30000000 -> 3002c000 - flags: 75 prot: 551
[  197.290000] dump: 3002c000 -> 30030000 - flags: 100073 prot: 551
[  197.300000] dump: 7f870000 -> 7f892000 - flags: 100173 prot: 551
[  197.310000] CPU #: 0
[  197.310000]    PC: 30001bf0    SR: 0000827e    SP: 7f891bf4
[  197.310000] GPR00: 00000000 GPR01: 7f891bf4 GPR02: 600008d4 GPR03: 7f891e90
[  197.320000] GPR04: 00000000 GPR05: 00000001 GPR06: 00000000 GPR07: 6ffffeff
[  197.330000] GPR08: 0000000a GPR09: 30001904 GPR10: 00000000 GPR11: 6ffffdff
[  197.340000] GPR12: 0000000b GPR13: fffffffc GPR14: 00153c80 GPR15: 00000032
[  197.340000] GPR16: 3002e000 GPR17: 600007d8 GPR18: 7faf1fbe GPR19: 00000000
[  197.350000] GPR20: 7faf1b44 GPR21: 00000000 GPR22: 00153c40 GPR23: 600002bc
[  197.360000] GPR24: 6000088c GPR25: 00000030 GPR26: 00000001 GPR27: 000000cc
[  197.360000] GPR28: 30000000 GPR29: 3002df88 GPR30: 00000000 GPR31: 00000000
[  197.370000]   RES: 6ffffdff oGPR11: ffffffff
[  197.370000] Process ld-linux-or1k.s (pid: 68, stackpage=c1736480)
[  197.380000] 
SEGV

For good measure read the dynamic section again...

    before                        after
    6ffffef5 300001d0 ----> 6ffffef5 600001d0
Again it was incremented by 0x3000000
/ # hexdump -C /mnt/lib/ld-linux-or1k.so.1 | grep 0002bf
00002bf0  04 00 2d 98 d7 e2 8e 18  e2 30 70 00 86 71 00 3c  |..-......0p..q.<|
0002bf00  4f 43 5f 43 48 45 43 4b  5f 00 00 00 00 00 00 00  |OC_CHECK_.......|
0002bf10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0002bf30  00 00 00 00 00 00 00 00  00 00 00 0e 00 00 02 0b  |................|
0002bf40  00 00 00 04 60 00 00 f4 *6f ff fe f5 60 00 01 d0* |....`...o...`...|
0002bf50  00 00 00 05 60 00 04 fc  00 00 00 06 60 00 02 bc  |....`.......`...|
0002bf60  00 00 00 0a 00 00 02 37  00 00 00 0b 00 00 00 10  |.......7........|
0002bf70  00 00 00 03 60 02 e0 00  00 00 00 02 00 00 00 30  |....`..........0|
0002bf80  00 00 00 14 00 00 00 07  00 00 00 17 60 00 08 a4  |............`...|
0002bf90  00 00 00 07 60 00 07 d8  00 00 00 08 00 00 00 cc  |....`...........|
0002bfa0  00 00 00 09 00 00 00 0c  6f ff ff fc 00 00 07 7c  |........o......||
0002bfb0  6f ff ff fd 00 00 00 03  00 00 00 18 00 00 00 00  |o...............|
0002bfc0  6f ff ff fb 00 00 00 01  6f ff ff f0 60 00 07 34  |o.......o...`..4|
0002bfd0  6f ff ff f9 00 00 00 0f  00 00 00 00 00 00 00 00  |o...............|
0002bfe0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Why is this happening

We can look inside of /lib/ld.so

Has code in _dl_start, in function elf_get_dynamic_info. To update ELF .dynamic section pointers. This updates program memory in the .dynamic section so pointers match the relocated object location in memory. For OpenRISC the kernel chooses the load offset of 0x3000000.

The code for elf_get_dynamic_info looks like this:

  /* Don't adjust .dynamic unnecessarily.  */
  if (l->l_addr != 0 && dl_relocate_ld (l))
    35b0:       86 21 00 24     l.lwz r17,36(r1)      // l_addr - 0x30000000
    35b4:       bc 11 00 00     l.sfeqi r17,0
    35b8:       10 00 00 3f     l.bf 36b4 <_dl_start+0x298>
    35bc:       15 00 00 00     l.nop 0x0
    35c0:       8e 61 01 be     l.lbz r19,446(r1)
    35c4:       a6 73 00 04     l.andi r19,r19,0x4
    35c8:       bc 33 00 00     l.sfnei r19,0
    35cc:       10 00 00 3a     l.bf 36b4 <_dl_start+0x298>
    35d0:       15 00 00 00     l.nop 0x0
      do                                                                      \
        if (info[tag] != NULL)                                                \
         info[tag]->d_un.d_ptr += l_addr;                                     \
      while (0)

      ADJUST_DYN_INFO (DT_HASH);
    35d4:       86 61 00 54     l.lwz r19,84(r1)                    // Read a dynamic location
    35d8:       bc 13 00 00     l.sfeqi r19,0
    35dc:       10 00 00 05     l.bf 35f0 <_dl_start+0x1d4>
    35e0:       15 00 00 00     l.nop 0x0
    35e4:       86 b3 00 04     l.lwz r21,4(r19)                   // Read the dynamic pointer value
    35e8:       e2 b1 a8 00     l.add r21,r17,r21                   // Add pointer + 0x30000000
    35ec:       d4 13 a8 04     l.sw 4(r19),r21                     // Store that value back to mm memory (should cause a page fault)
      ADJUST_DYN_INFO (DT_PLTGOT);
    35f0:       86 61 00 50     l.lwz r19,80(r1)
    35f4:       bc 13 00 00     l.sfeqi r19,0
    35f8:       10 00 00 05     l.bf 360c <_dl_start+0x1f0>
    35fc:       15 00 00 00     l.nop 0x0
    3600:       86 b3 00 04     l.lwz r21,4(r19)
    3604:       e2 b1 a8 00     l.add r21,r17,r21
    3608:       d4 13 a8 04     l.sw 4(r19),r21
      ADJUST_DYN_INFO (DT_STRTAB);
    360c:       86 61 00 58     l.lwz r19,88(r1)
    3610:       bc 13 00 00     l.sfeqi r19,0
    3614:       10 00 00 05     l.bf 3628 <_dl_start+0x20c>
    3618:       15 00 00 00     l.nop 0x0
    361c:       86 b3 00 04     l.lwz r21,4(r19)
    3620:       e2 b1 a8 00     l.add r21,r17,r21
    3624:       d4 13 a8 04     l.sw 4(r19),r21

The dynamic section is copied to memory by the kernel in fs/binfmt_elf.c

No copying needs to be done, as the file is open we just mmap the pages into memory.

  load_elf_binary()
                // Calls this to map the file into memory at load_bias (0x30000000)
                error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
                                elf_prot, elf_flags, total_size);

                // which then calls
        map_addr = vm_mmap(filep, addr, size, prot, type, off);

The kernels load_elf_binary sets flag MAP_PRIVATE, which translates to 0x551 which we saw earlier. With MAP_PRIVATE it should create a private copy-on-write mapping

Writes should cause a page-fault which will then allow for pages to be copied.

This is not happening, so instead the file is getting written to.

bandvig commented 2 years ago

@stffrdhrn Impressive investigation! My competencies are not enough to understand your findings. But you pushed me to look at MAROCCHINO's DMMU once more time. And I found the following:

https://github.com/openrisc/or1k_marocchino/blob/79ce290612a0d410023998f258a1755abbea56bc/rtl/verilog/or1k_marocchino_dmmu.v#L775-L779

I do not remember exactly why I did this. Perhaps it was something wrong with access to UART either on my Atlys SoC or on simulations. Could you try to comment all these lines except

https://github.com/openrisc/or1k_marocchino/blob/79ce290612a0d410023998f258a1755abbea56bc/rtl/verilog/or1k_marocchino_dmmu.v#L776

I am going to try this on second half of Saturday.

stffrdhrn commented 2 years ago

@bandvig Hello, that is a good finding. I tried to change the code to just have assign phys_addr_o = phys_addr_r; but it does not seem to change much.

As I see it that change allows stores to bytes to work. Which we will need for normal operations. But it seems unrelated to the issue I found.

The issue I explained above is that the page protection bits UWE SWE do not seem to be working.

Example

When I say page is protected I mean, lets say dmmu set x is setup for example as:

A write to address 0x30000080 in user mode should cause a page fault because UWE bit mis missing. This doesn't seem to be happening

Where:

// MR_V_MASK   0x00000001   - valid bit

// TR_CC_MASK  0x00000001  - cache coherent
// TR_A_MASK   0x00000010 - accessed
// TR_URE_MASK 0x00000040 - user read enabled
// TR_SRE_MASK 0x00000100 - supervisor read enabled
// TR_EXEC_MASK 0x00000400 - execution enabled

DMMU code

The code to handle that is here, which looks ok to me:

https://github.com/openrisc/or1k_marocchino/blob/79ce290612a0d410023998f258a1755abbea56bc/rtl/verilog/or1k_marocchino_dmmu.v#L330-L332

It is a bit different compared to mor1kx, but just operators seem to differnt:

https://github.com/openrisc/mor1kx/blob/aca24812520db8e3a0347ac45b0ac18027ccf053/rtl/verilog/mor1kx_dmmu.v#L183-L186

stffrdhrn commented 2 years ago

I got my test program working. But no luck yet reproducing the issue. I have a few more things to try.

bandvig commented 2 years ago

@stffrdhrn Yes, it was small hope that assign phys_addr_o = phys_addr_r; could fix the problem. At the same time the assign phys_addr_o = {phys_addr_r[OPTION_OPERAND_WIDTH-1:2],2'd0}; follows (not obviously) CAPPUCCINO's: https://github.com/openrisc/mor1kx/blob/aca24812520db8e3a0347ac45b0ac18027ccf053/rtl/verilog/mor1kx_lsu_cappuccino.v#L661-L663

bandvig commented 2 years ago

@stffrdhrn It looks like I have found the root of the problem. I implemented a super-cache engines in MMUs which caches most recent transformation. These cashes are based on a register and allow to achieve higher values for cpu_clk. But it looks like the cache update is incorrect. So I just force updating the caches at each access. That brings extra two clocks before I(D)CACHE hit/miss detection. And It obviously leads to performance degradation if MMUs are active. On the other hand my Linux build (compiled with GLibc-based toolchain) now boots up and operates normally (however I have not tested network functionality yet). Please find the quick fix in fix_mmu branch. It also includes free access to FPCSR from user space.

stffrdhrn commented 2 years ago

Excellent! My full system boots too now.

We can see it marocchino because of orfpx64

Welcome to Linux on LiteX-or1k
buildroot login: root
                __   _
               / /  (_)__  __ ____ __
              / /__/ / _ \/ // /\ \ /
             /____/_/_//_/\_,_//_\_\
                   / _ \/ _ \
   __   _ __      _\___/_//_/       ___ __
  / /  (_) /____ | |/_/_____   _ ,_/  // /_
 / /__/ / __/ -_)>  </___/ _ \/ V_// //   /
/____/_/\__/\__/_/|_|    \___/_/  /_//_/\_\

 32-bit OpenRISC CPU with MMU integrated in a LiteX SoC
# 
# cat /proc/cpuinfo 
processor               : 0
cpu architecture        : OpenRISC 1000 (1.1-rev0)
cpu implementation id   : 0x2
cpu version             : 0x10004
frequency               : 100000000
dcache size             : 4096 bytes
dcache block size       : 16 bytes
dcache ways             : 1
icache size             : 4096 bytes
icache block size       : 16 bytes
icache ways             : 1
immu                    : 64 entries, 1 ways
dmmu                    : 64 entries, 1 ways
bogomips                : 200.00
features                : orbis32  orfpx32 orfpx64  

# uname -a
Linux buildroot 5.16.0-rc3-smh-00003-gd9c9b8f7fce7-dirty #618 SMP Sun Jan 30 07:07:55 JST 2022 openrisc GNU/Linux
bandvig commented 2 years ago

@stffrdhrn I hope I fixed I(D)MMUs super-cache miss detection. Please, find updates in fix_mmu branch.

stffrdhrn commented 2 years ago

I will test it. Give me some time as I'm using my fpga board for something else right now.

bandvig commented 2 years ago

@stffrdhrn I don't hurry up absolutely. Please go ahead with the tempo most comfortable for you. Now days so few persons continue doing something for OpenRISC. Personally I was seriously demotivated about advancing MAROCCHINO. I did previous serious updates about two or even three years ago. But your persistent efforts on OpenRISC system software (GCC, Glibc, Linux, etc) are really encourage me. Thank you for your work.

juliusbaxter commented 2 years ago

Just read through this thread, incredible debug effort gents!

I'm curious as to why it was seen only recently, is it the first time it's been integrated and Linux booted in this particular Litex SoC (and thus subject to its memory map, bus architecture etc?)

stffrdhrn commented 2 years ago

Hi @juliusbaxter , It sure is a long thread to read through with debugging dead ends and clues. This was the first time I booted linux on it.

The main thing that is different here was using the glibc built rootfs. It seems this runtime uses copy-on-write functionality which relies on the MMU. It appears a normal busybox runtime does not require this. It could explain why this slipped through earlier linux booting efforts.

stffrdhrn commented 2 years ago

@bandvig I have tested the new fixes in the branch and it's working for me. Shall you merge that and we can mark this one as closed? Note for GSoC 2022, I have proposed a project to get the FPU support added to linux. If that doesn't get any students I will do the work myself.

From now I will start getting the SPEC updated to permit user access to FPCSR.

bandvig commented 2 years ago

Merged into master. Good idea about GSoC 2022.