ucb-bar / fpga-zynq

Support for Rocket Chip on Zynq FPGAs
http://bar.eecs.berkeley.edu/projects/2014-rocket_chip.html
Other
394 stars 191 forks source link

Port to PYNQ-Z1 board locks up when running fesvr #35

Closed GuzTech closed 7 years ago

GuzTech commented 7 years ago

I'm porting fpga-zynq to the Digilent PYNQ-Z1 board (my fork) for which I used the Zybo as a reference. The problem is that when I run the fesvr-zynq pk hello command, the entire system locks up.

I've had many issues that any access of the GP0 AXI port would lock up the system, but I've managed to solve that issue. I can attach and use other peripherals on the AXI interconnect that also connects to the RISCV slave block without a problem. Now I suspect that there is something wrong on the RISCV side.

I've routed the target reset signal from the resetter IP and the target clock to the LEDs as a means of debugging, and confirmed that the clock is 50MHz and the reset remains asserted until I run fesvr so the RISCV core should be out of reset. I'm running out of ideas as what could be the problem.

Any pointers on where I should look?

davidbiancolin commented 7 years ago

As you suggest, my experience thus far suggests that it is usually a bus issue -- i just cannot think of why rocketchip's behavior would be any different on the PYNQ-Z1, than our other supported platforms, assuming of course, you are indeed using the same configuration as appears in the zybo.

Does fesvr make any progress? Or does the system just hang on the first transactions issued to the core?

GuzTech commented 7 years ago

Yes, bus issues were the initial problem, but that was because Vivado sometimes has problems with stale files. I use the same ZynqSmallConfig as the Zybo, even though the PYNQ board has a XCZ7020. What's different is of course DDR settings of the PS and the usage of UART0 on pins 14/15 instead of UART1 on pins 48/49 on the Zybo. I made no other changes.

I only know that when I run fesvr that there is no output, and that the target reset is then pulled low. I cannot think of anything else what could be causing this that is a result of my porting efforts, which is why I'm starting to suspect that there is maybe an issue with the RISCV side of things.

davidbiancolin commented 7 years ago

You probably should instrument fesvr-zynq (in common/csrc/) to see if fesvr successfully communicates at all with the core.

If it does -> it should reach a state where it constantly polls the target to check for doneness and handle requested system calls. If there is a problem with the target, you'll probably see this behavior.

If it doesn't -> (Unlikely, given what you've described) Probably a bus problem.. Drop an ILA on the bus and see what's going wrong when you write to and read from the target's region of the memory map.

GuzTech commented 7 years ago

I ran gdbserver HOST:1234 fesvr-zynq on the PYNQ and connected with arm-linux-xilinx-gnueabi-gdb using fesvr-zynq with debugging symbols enabled. The fesvr-zynq executable on the board does not have debugging symbols, so if I use the same version as on the host, then I cannot connect to gdbserver from the host because of errors. gdbserver complains that it received an EOF and gdb on the host complains that it cannot read memory locations 0x1000_0004 and 0x1000_0000.

So using the original executable on the board and one with debugging symbols on the host, I set a breakpoint at main and simply continue. When I do this the board locks up immediately after the target reset signal is pulled low. No output on UART either. Why it does not break at main is probably because the executable on the board has no debugging symbols. So I either need to be able to debug the executable with debug symbols, or I need to use the ILA.

I used U-Boot to read and write to the GP0 AXI port. For example, reading from 0x43C0_0000 locks up the system, but I can write to it. I can read 0x43C0_0008 and 0x43C0_000C which are FIFO data and FIFO count respecively according to a small memory map I found in the top Chisel file. Is there any overview on how the fesvr accesses the RISCV side and what it writes to and reads from it (without going through the Chisel and C source)?

davidbiancolin commented 7 years ago

Yeah, unfortunately, there is no documentation for the tether itself, and little documentation for fesvr. Essentially all Fesvr is doing is issuing commands to the target for each of which it expects to get a response. TSI merely transports these requests and responses between FESVR and the target, over those FIFOs you've discovered. Fesvr initially loads the program and then resets the target core, after which it polls the target for pending system calls (namely, console output) and to check for program completion.

0x43C0_0000 is the head of the output fifo (ie. responses from the target) Given it should be empty at startup, a read to that address will block indefinitely because that no data will be enqueued into the output fifo without first making a request to the target.

You should still be able to run gdb on the arm core debugging fesvr-zynq without symbols for libfesvr (which you should be able to produce anyway in common/Makefrag). However, a sufficient but hacky solution is to put printfs in zynq_tsi_driver_t::poll() (in /common/csrc/zynq_tsi_driver.cc) and see if it rams the console.

If it doesn't -> ILA.

If it does -> go into simulation/ and type make. This will give you a platform independent simulator with the same RTL as is being synthesized. You can cross check the traffic over the fifos against what you see here.

GuzTech commented 7 years ago

Thanks for the suggestions! So I added some printfs here and there, compiled it, and sent it to the board. When I run it I get a segmentation fault even if I just compile it without modifications. I put a printf as the first statement in main, and I still get a segfault, so there's something really wrong here. FileZilla should not use Auto transfer mode, but binary if you want to transfer files exactly!

Some general remarks:

./fesvr-zynq: symbol lookup error: ./fesvr-zynq: undefined symbol: _ZN5tsi_tC1ERKSt6vectorISsSaISsEE

GuzTech commented 7 years ago

Ok, the problem was that the libfesvr.so in the uramdisk.image.gz was incompatible with the version that was built. After copying libfesvr.so from common/build to the ramdisk image, I was able to run the new fesvr. And yes, this is described in the README, and I did read it fully, but of course I forgot this.

I modified zynq_tsi_driver_t::poll() resulting in:

#define SAI_BASE_PADDR 0x43C00000L
#define SAI_OUT_FIFO_DATA 0x00
#define SAI_OUT_FIFO_COUNT 0x04
#define SAI_IN_FIFO_DATA 0x08
#define SAI_IN_FIFO_COUNT 0x0C
#define SAI_SYS_RESET 0x10

void zynq_tsi_driver_t::poll(tsi_t *tsi)
{
    printf("poll\n");
    while (read(SAI_OUT_FIFO_COUNT) > 0) {
        printf("while 1: pre\n");
        uint32_t out_data = read(SAI_OUT_FIFO_DATA);
        tsi->send_word(out_data);
        printf("while 1: post\n");
    }

    printf("starting while 2\n");
    while (tsi->data_available() && read(SAI_IN_FIFO_COUNT) > 0) {
    printf("while 2: pre\n");
        uint32_t in_data = tsi->recv_word();
        write(SAI_IN_FIFO_DATA, in_data);
        printf("while 2: post\n");
    }

    printf("switching to host\n");
    tsi->switch_to_host();
    printf("poll end\n");
}

My output:

**cut a bunch of repeated out**
poll
while 1: pre
while 1: post
while 1: pre
while 1: post
while 1: pre
while 1: post
while 1: pre
while 1: post
starting while 2
switching to host
poll end
poll
starting while 2
while 2: pre
while 2: post
while 2: pre
while 2: post
while 2: pre
while 2: post
while 2: pre
while 2: post
while 2: pre
while 2: post
switching to host
poll end
poll
while 1: pre
while 1: post
while 1: pre
while 1: post
while 1: pre
while 1: post
while 1: pre
while 1: post
starting while 2
switching to host
warning: tohost and fromhost symbols not in ELF; can't communicate with target

The good news is that it doesn't hang the system anymore, but it also does not work yet. I also read that Vivado 2015.4 has to be used for building fesvr-zynq. Is there any reason for this? I found that the original fesvr-zynq needed the pthreads library, whereas my compiled version does not for some reason. Probably because I used Vivado 2016.2.

davidbiancolin commented 7 years ago

2016.2 is the intended version.

Well this is a problem: warning: tohost and fromhost symbols not in ELF; can't communicate with target You're running ./fesvr-zynq pk hello correct? I'm assuming you're just using the pk and hello objects you found in the ramdisk?

GuzTech commented 7 years ago

@davidbiancolin Yes I'm running ./fesvr-zynq pk hello using the pk and hello files found in the ramdisk.

For some reason, the executables in the ramdisk related to RISCV are giving me trouble, so I wouldn't be surprised if the pk and hello executables are not functioning correctly. But the same ramdisk works just find on the Zybo (with the bitstream built using 2016.2).

davidbiancolin commented 7 years ago

If you haven't already tried, there's a strong possibility this will all be fixed if you build fesvr, pk, and the application all from scratch, if only to get versioning correct. Make sure you're dropping libfesvr.so in the right place too.