jbush001 / NyuziProcessor

GPGPU microprocessor architecture
Apache License 2.0
2.01k stars 353 forks source link

xv6 operating system hangs when running on FPGA #155

Open jbush001 opened 6 years ago

jbush001 commented 6 years ago

https://github.com/jbush001/xv6-nyuzi

This boots fine under the emulator and verilog simulation, but hangs before showing the shell prompt when run on the FPGA board. This could be an issue specific to the FPGA configuration, or it could be exposing a synthesis/simulation mismatch. Should track this down.

Debugging this might be easier if #153 were implemented.

jbush001 commented 6 years ago
program running, entering console mode
started cpu 1
started cpu 2
started cpu 3
sb: size 1000 nblocks 941 ninodes 200 nlog 30 logstart 2 inodestart 32 bmap start 58

Nothing else after this...

jbush001 commented 6 years ago

Tried making this run on only one thread to check for a deadlock. Still hangs...

--- a/main.c
+++ b/main.c
@@ -29,7 +29,7 @@ main(void)
   binit();         // buffer cache
   fileinit();      // file table
   bdev_init();       // disk
-  startothers();   // start other processors
+//  startothers();   // start other processors
   kinit2(P2V(4*1024*1024), P2V(PHYSTOP)); // must come after startothers()
   userinit();      // first user process
   mpmain();        // finish this processor's setup
jbush001 commented 6 years ago

Made a global variable 'count' and added the following to the top of trap():

REGISTERS[REG_RED_LED] = count++;

I can see the LEDS counting off. So interrupts are enabled and it is receiving them.

jbush001 commented 6 years ago

Added code to dump interrupt pending register. Only LED1 is lit (indicating it is taking timer interrupts):

REGISTERS[REG_RED_LED] = __builtin_nyuzi_read_control_reg(CR_INTERRUPT_PENDING);
jbush001 commented 6 years ago

Disabled the timer interrupt by commenting out the IRQ enable in mpmain:

//irq_enable(IRQ_TIMER);

A theory I had is that the interrupt is stuck, so eret immediately takes another interrupt. The interrupt LED no longer is lit, but the process still hangs in the same place, which disproves this theory.

jbush001 commented 6 years ago

I can do ctrl-P in the terminal (process listing) and it prints the following:

1 run initcode

("run" corresponds to the state RUNNING). So userinit() has successfully completed and created an init process, which is supposedly running, but doesn't seem to have forked the shell.

jbush001 commented 6 years ago

Added code to indicate if a syscall occurs. The LEDs do not turn on:

--- a/trap.c
+++ b/trap.c
@@ -116,6 +116,7 @@ trap(struct trapframe *tf)

   switch(trap_cause & 0xf){
   case TT_SYSCALL:
+    REGISTERS[REG_GREEN_LED] = 0xffff;

Almost the first thing initcode should do is execute a syscall to execute the real 'init' process:

.globl _start
_start:
  lea s0, init
  lea s1, argv
  syscall SYS_exec

init:
  .string "/init\0"
argv:
  .long init
  .long 0
jbush001 commented 6 years ago

One issue I noticed is that the the system does not to an iinvalidate after copying the initcode into the first address space. This will probably still contain boot code.

jbush001 commented 6 years ago

Tried to add some code to perform iinvalidate, but still hangs:

--- a/vm.c
+++ b/vm.c
@@ -203,13 +203,16 @@ void
 inituvm(pde_t *pgdir, char *init, uint sz)
 {
   char *mem;
+  uint i;

   if(sz >= PGSIZE)
     panic("inituvm: more than a page");
   mem = kalloc();
   memset(mem, 0, PGSIZE);
-  mappages(pgdir, 0, PGSIZE, V2P(mem), PTE_W | PTE_X);
   memmove(mem, init, sz);
+  mappages(pgdir, 0, PGSIZE, V2P(mem), PTE_W | PTE_X);
+  for (i = 0; i < PGSIZE; i += CACHE_LINE_SIZE)
+      __asm__("iinvalidate %0" : : "s" (i));
 }

 // Load a program segment into pgdir.  addr must be page-aligned
@@ -243,6 +246,7 @@ allocuvm(pde_t *pgdir, uint oldsz, uint newsz)
 {
   char *mem;
   uint a;
+  uint i;

   if(newsz >= KERNBASE)
     return 0;
@@ -264,6 +268,9 @@ allocuvm(pde_t *pgdir, uint oldsz, uint newsz)
       kfree(mem);
       return 0;
     }
+
+    for (i = 0; i < PGSIZE; i += CACHE_LINE_SIZE)
+      __asm__("iinvalidate %0" : : "s" (a + i));
   }
   return newsz;
 }
jbush001 commented 6 years ago

One difference between simulation and FPGA is that the former by default randomizes all flops and SRAM, where the FPGA clears it. I tried disabling this in the simulator by using the +randomize=0 parameter, but it still booted successfully.

nyuzi_vsim +bin=kernelmemfs.hex +randomize=0
cores 1|threads per core 4|l1i$ 16k 4 ways|l1d$ 16k 4 ways|l2$ 128k 8 ways|itlb 64 entries|dtlb 64 entries
sb: size 1000 nblocks 941 ninodes 200 nlog 30 logstart 2 inodestart 32 bmap start 58
init: starting sh
$

This would be a lot easier to debug if I could reproduce it in simulation.

x653 commented 1 year ago

Hi there, I did port xv6 to Riscv32i and succeded to run it on an fpga implementation of RV32ia. Also on my fpga i noticed the same behaviour as you. Xv6 stuck and no shell prompt. I investigated and found that the reason (in my case) was.,that UART continuously fired interrupts because the transmit holding register was empty which consumed all the cpu power because cpu was continuosly calling the interrupt trap. The interrupt routine of uart.c should clear the interrupt source according to soecs of the UART controller 16550 by either: 1.write something to uart Or 2.clear the transmit holding register empty interrupt by reading the ISR register of UART.

So I added the following command in uart.c. In the function void uartstart()

... If (uart_tx_w==uart_tx_r){ ReadReg(ISR); //clear interrupt source return; }

And it worked!

jbush001 commented 1 year ago

Interesting. It's been a long time since I looked at this problem and I don't remember much about it. :smile: But thanks for the suggestion!

martinKindall commented 1 year ago

Interesting, I had also in mind to do something similar in the future. May I ask how did you deal with the MMU logic? I am currently studying the xv6 and for what I understood the MMU is simulated in software (vm.c, walk() function). Does it mean the risc-v core on my fpga does not need to know anything about pages?

x653 commented 1 year ago

Hi Martin,

the CPU has to do MMU logic in hardware. When vma is activated, the cpu does the "walk" in hardware to find the appropriate physical memory location. The walk function in vm.c is used by the kernel to calculate the right physical position. This is done in software, but the CPU still has to do it in hardware.

In my implementation I choose to do the pagetable walk in the simplest way, by walking through the page tables on every single memory read. So I did not implement TLB (look aside buffer).

Please, have a look on my repo: https://gitlab.com/x653/xv6-riscv-fpga

You find the implementation of the virtual memory addresses in fpga/vma.v

Best Micha Am Sonntag, dem 12.02.2023 um 08:06 -0800 schrieb Martin:

Interesting, I had also in mind to do something similar in the future. May I ask how did you deal with the MMU logic? I am currently studying the xv6 and for what I understood the MMU is simulated in software (vm.c, walk() function). Does it mean the risc-v core on my fpga does not need to know anything about pages? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>