sy2002 / QNICE-FPGA

QNICE-FPGA is a 16-bit computer system for recreational programming built as a fully-fledged System-on-a-Chip in portable VHDL.
http://qnice-fpga.com
Other
69 stars 16 forks source link

The M-L monitor command drops part of large files over UART on Nexys4DDR #125

Closed MJoergen closed 4 years ago

MJoergen commented 4 years ago

Steps to reproduce:

You'll notice large sections of memory containing 0x0000's, which are not present in the original file.

Some more debugging using the ILA shows that the FPGA is not receiving the entire file over the UART connection. So it looks like a buffer overflow in the USB-to-UART chip on the Nexys4DDR board.

I've verified that the same file (cpu_test.out) can correctly be transferred to the Nexys4DDR using the qtransfer program, using the same UART connection.

sy2002 commented 4 years ago

This is an interessting one.

First of all, I am glad that it works stable with qtransfer now and that I added qtransfer to the Monitor. This gives as one very reliable way to transfer the data.

About the challange itself:

I just tried to reproduce this: It works like a charm. I tried multiple times.

Just for being on the save side: I attached my bitstream to this comment. You might want to try it. It is based in Github commit #85e2d29. See below.

I even moved the start address of cpu_test.asm to 0xB000 so that I was able to CRC check it with c/test_programs/memcrc.c. memcrc is able to calculate a CRC16 on any memory area while running on QNICE, it itself starts at 0x8000 and tinkers with addresses in the 0xA000 space, so you need to change the .org of the file under test to 0xB000 to use it, so that memcrc and the file under test will not collide. The counterpart is c/test_programs/filecrc.c which is meant to run on the host and which spits out the CRC value "as if it was on QNICE". So you can use these two apps in your tests: Calculate the value of the .out file on the host and then recalculate it on QNICE. This is more reliable than looking at memory areas that contains 0x0000s .

The biggest question is:

Why does it work like a charm on my side?

Do I have newer/better FTDI drivers on my host machine? As you know, the Nexys4DDR uses an FTDI chip for emulating serial communication over USB. Maybe you can try an update/upgrade.

Now three questions:

  1. Could you please retry using my bitstream (just being paranoid ;-)) env1.bit.zip

  2. Did you swithch on RTS/CTS in your terminal program? QNICE is able to halt the transfer using RTS/CTS if the FIFO buffer overruns. If yes: Can you try another / new terminal program than the one you are using? I would prefer one with a GUI, so that when you switch on RTS/CTS without any danger of getting some command line argument wrong (again: just being paranoid ;-) )

  3. Could you please try to increase the FIFO buffer forbus_uart.vhd specified in env1_gloals in UART_FIFO_SIZE (currently 32 bytes)?

If all of this is not working, we you might want to think about refactoring the UART module. I would start with my wrapper from 2015 bus_uart.vhd :-)

MJoergen commented 4 years ago

I tried triggering on the signal bus_uart/fifo_full and that signal was never asserted. This is to be expected since the baudrate is only 115200 bits/s and with 10 bits per character this gives 11520 characters per second. With a CPU clock frequency of 50 MHz this gives 4340 clock cycles for each character, and the CPU is sitting in a very small busy loop doing nothing but servicing the UART. So all this to say that the Rx UART FIFO in the FPGA is never containing more than a single character.

I searched for any errors when reading from the Rx UART FIFO (e.g. any race condition between the CPU reading from the FIFO and the UART writing to the FIFO), but found none.

It seems that the data received into the FPGA is erroneous, i.e. that already on the input to the FPGA some data is missing.

My guess is that the USB-UART chip (FTDI FT2232HQ) on the Nexys4DDR is corrupting the data, maybe because of a FIFO overrun. That chip has an onboard Rx FIFO of 4096 characters, and we are certainly sending much larger amounts of data than that.

I will try with a different terminal program. What program are you using on the MAC ?

sy2002 commented 4 years ago

I love riddles :-) And this one is really a tough one.

Here are my questions/thoughts:

1)

On my side it works super stable: Same FPGA board than you have. So what are the differences between your experiment and mine? We might want to think about that a bit deeper. As soon as we find the right difference between your and my setup, we might find the root cause. So here are the differences I can think of:

2)

It seems that the data received into the FPGA is erroneous, i.e. that already on the input to the FPGA some data is missing.

What about the FTDI driver? Did you try multiple machines on your side with multiple FTDI drivers? (Why do I think that this is relevant: The fact that the communication between your Vivado and the board is stable and works fine in general, for example when programming the FPGA or when using ILA and stable lets me assume that there is no hardware bug in the chip on the Nexys. But I can think of a software bug in the FTDI drivers: Let's not forget: There is no serial port at all on the host side: this is just an emulation over USB...)

My guess is that the USB-UART chip (FTDI FT2232HQ) on the Nexys4DDR is corrupting the data, maybe because of a FIFO overrun. That chip has an onboard Rx FIFO of 4096 characters, and we are certainly sending much larger amounts of data than that.

This would point to a driver issue, beause we talk about a simulated serial over USB thing.

3)

So all this to say that the Rx UART FIFO in the FPGA is never containing more than a single character.

Just a tought about this statement: On MEGA65, when I switchted off the UART, I had a super unstable behaviour, similar to what you describe in this issue. This was the reason at all, why I added an UART: It happend during the porting of I QNICE-FPGA to MEGA65. I guess we must not underestimate the delay/latencies/effects, that the FTDI chip is bringing to the game

4)

One more thing - pretty absurd, but I think it makes sense to share it with you: Back in the days, when I did not know yet, what a "timing closure" is and when I developed the EAE, I missed out to do something like this (back in the days I was using ISE):

INST "eae_inst/op*" TNM="EAE_OPS";
INST "eae_inst/res*" TNM="EAE_RES";
TIMESPEC TS_EAE=FROM "EAE_OPS" TO "EAE_RES" 30 ns;

The result: Completely unstable UART and keyboard... No idea why. You can read my old commit comment from 2016 - searched and found it :-) Here it is: https://github.com/sy2002/QNICE-FPGA/commit/d47523b156c554a17b0b4e00ba4d6a4156aee0c4

sy2002 commented 4 years ago

UPDATE: I added point 4) to above-mentioned message.

sy2002 commented 4 years ago

Sorry for spamming you - one more addition to my above-mentioned point 2) (please read my lengthy comment https://github.com/sy2002/QNICE-FPGA/issues/125#issuecomment-690994600 first):

Just had a look: The terminal program I am using comes with its own FTDI drivers:

grafik

Here, the maker of my terminal program "Serial" states that it indeed directly supports the FTDI FT2232: https://www.decisivetactics.com/support/view?article=compatible-devices

So more and more this "smells" like a driver problem.

MJoergen commented 4 years ago

So more and more this "smells" like a driver problem.

I agree. I'll check when I have the opportunity. This document - https://www.ftdichip.com/Support/Documents/AppNotes/AN_220_FTDI_Drivers_Installation_Guide_for_Linux.pdf - has some information about the FTDI drivers.

MJoergen commented 4 years ago

I tried increasing the baud rate to 1.5 Mbaud in the file env1_globals.vhd and when starting the terminal program (I use picocom). However, even though the 'H' (HELP) command worked as expected, using the M-L command sequence and pasting the cpu_test.out file lead to severe data corruption, much worse than before.

MJoergen commented 4 years ago

While debugging, I've noticed something peculiar. When running with the high baudrate of 1.5 Mbaud and examining the waveform in ILA, I notice that the first 20 (or so) characters received are correct, and yet there are characters missing in the echo back to the user. The screen shot below shows the problem:

Screenshot from 2020-09-12 16-33-47

Due to the high speed two characters were received into the FIFO (0x30 and 0x78) before the CPU began reading. However, the CPU read lasts two clock cycles and therefore "pops" both characters from the FIFO, but only the first character (0x30) is given to the CPU. The second character (0x78) is lost.

This behaviour is much worse with the high baudrate, but can not be completely ruled out with lower baudrates.

This behaviour is more generally tied together with the Issue #55, but a work-around for the UART module can easily be implemented.

MJoergen commented 4 years ago

I changed the UART module to not do a double-read, but the issue still remains.

Then I noticed that the CPU is using about 750 clock cycles for each character received, and that most of that time was spent in the routine STR$STRCHR. I optimized that routine so the CPU now only uses about 500 clock cycles for each character, but still the issue remains.

I see that the data output from the UART FIFO is corrupted, i.e. missing data. I then notice that the FIFO full signal is asserted, and this signal is connected to the CTS output. However, even though this signal is asserted the sender (FTDI chip) still sends a few more characters, and this causes FIFO overrun. So I changed the CTS signal to be asserted when the free space in the UART FIFO is less than 5 characters. This now correctly pauses transmission, and the UART FIFO never becomes full. But still the issue remains.

Next thing to try is to decrease the size of the FIFO to see whether the data INTO the FIFO is corrupted or not.

sy2002 commented 4 years ago

@MJoergen About this:

I then notice that the FIFO full signal is asserted, and this signal is connected to the CTS output. However, even though this signal is asserted the sender (FTDI chip) still sends a few more characters, and this causes FIFO overrun.

IMPORANT: It is not the FTDI chip which works with the CTS signal. The FTDI chip is just a serial to USB converter which encapsulates the serial signals, transmits them via the USB protocol and the driver on the host side is unpacking everything again. It is just a simulation. There is no serial port...

So it is your terminal program, which needs to handle CTS.

If your terminal program is not handling CTS correctly, then this causes problems:

serial pins on FPGA => FTDI-CHIP => USB => FTDI-Driver => Terminal Program

And the part

FTDI-CHIP => USB => FTDI-Driver

should be transparent, so that for the serial pins on the FPGA it "feels" as if they are directly connected to the terminal program.

Could you rule out a FTDI driver problem?

I need to re-iterate that on my setup with the 100% identical board, it works the same, this is why it might be interessting to find out, what is different between our setup (see https://github.com/sy2002/QNICE-FPGA/issues/125#issuecomment-690994600)

MJoergen commented 4 years ago

Ok, with a smaller FIFO I was able to examine the data going INTO the UART FIFO:

I can see that the data received by the FPGA is corrupted. In particular, by comparing the characters read into the UART FIFO with the contents of the cpu_test.out file I see that a total of 118 characters are MISSING from the input stream. This observation seems to acquit the FPGA completely. The deletion happens after having received 72k correct characters.

I'm using the picocom program with the switch -f h and this is recognized by the program as selecting hardware flow control because it responds with

*** flow: RTS/CTS

So, what can we do to make this work? I'll look into how to change the FTDI driver as the next step.

sy2002 commented 4 years ago

Wow - this is really crazy stuff! 😧

I'll look into how to change the FTDI driver as the next step.

As far as I know, you're having multiple machines: Was it reproducible on all your machines?

MJoergen commented 4 years ago

This is getting more and more mysterious. I just learned about the USB monitoring program on Linux called usbmon. With that I can dump all the USB traffic from the host to the Nexys4DDR card. Using this program I can see that already there the traffic is corrupted. In other words, the USB traffic sent from the host is missing some characters.

I even tried a different USB port on the host, and still characters are missing.

So next thing is to install the FTDI drivers on my laptop and try there.

sy2002 commented 4 years ago

Using this program I can see that already there the traffic is corrupted. In other words, the USB traffic sent from the host is missing some characters.

WOW! This is one of the more interessting riddles I saw in the last years... 😮

But it supports the hypothesis that our hardware works. On the other hand, some questions that arise are: Why can Vivado then communicate so nicely and without errors and why can qtransfer work so stable? (The answer might be, that both use smaller packet sizes: qtransfer for example only roughly ~400 bytes per package.)

You could indeed try to install the FTDI drivers on your laptop and try there.

You could also try to remove the old version of the FTDI driver on your other machine and replace it.

MJoergen commented 4 years ago

Just retried on my work-computer running Ubuntu 18.04.5 LTS and the observed problem is the same.

So in this setup I'm using HEAD on the develop branch, i.e. going back to 115 kbit/s, and only about 60 % of the text is transferred over the USB interface. The transmitted text stops in the middle of a line, i.e. the last few characters are 0x8AF. I'm capturing the USB traffic using the usbmon utility. I can see that the last characters echoed back on the terminal is indeed 0x8AF, and a memory dump confirms that addresses after 0x8AF5 contain no data, so I'm confident that the captured traffic matches what is sent on the USB interface.

I've checked that after the transfer of the text file, the USB interface correctly transmits the Ctrl-E key-press.

I've repeated the test a number of times, and although the amount of text transferred is not identical, it is all in the same rough size, i.e. 0x8BA5 was seen as the last address transferred in one of the test runs.

Just being paranoid, I've verified that copy-pasting to a regular file works: I can see that all addresses up to 0x9515 get pasted correctly. So the text is indeed present in the clip-board buffer.

I'm stumped. I guess the next step could be to check whether the serial program picocom is indeed writing the entire text to the serial port (using tools like socat and/or strace).

sy2002 commented 4 years ago

OMG - can you try another terminal program? Maybe cu ? Maybe screen ? Maybe tip?

MJoergen commented 4 years ago

Using strace I can confirm that the program picocom DOES read the entire program text from stdin (file descriptor 0) - up to 0x9515

read(0, "0", 1)                         = 1
write(1, "\7", 1^G)                       = 1
select(4, [0 3], [3], NULL, NULL)       = 1 (in [0])
read(0, "x", 1)                         = 1
write(1, "\7", 1^G)                       = 1
select(4, [0 3], [3], NULL, NULL)       = 1 (in [0])
read(0, "9", 1)                         = 1
write(1, "\7", 1^G)                       = 1
select(4, [0 3], [3], NULL, NULL)       = 1 (in [0])
read(0, "5", 1)                         = 1
write(1, "\7", 1^G)                       = 1
select(4, [0 3], [3], NULL, NULL)       = 1 (in [0])
read(0, "1", 1)                         = 1
write(1, "\7", 1^G)                       = 1
select(4, [0 3], [3], NULL, NULL)       = 1 (in [0])
read(0, "5", 1)                         = 1

but DOES NOT write the entire text onto the serial port (file descriptor 3) - only up to 0x94A5. The last two writes to file descriptor 3 are:

write(3, "0x9481 0x000E\r0x9482 0x00A0\r0x94"..., 509) = 509
write(3, "\5", 1) 

The second line corresponds to Ctrl-E and the first line transfers 509 characters, which is 509/14=36 addresses, i.e. up to 0x9481+0x0024 = 0x94A5.

Time to find a different terminal program ...

sy2002 commented 4 years ago

but DOES NOT write the entire text onto the serial port (file descriptor 3) - only up to 0x94A5.

What a bummer!

0x94A5 minus 0x8000 = the very odd amount of 5285 + 1 = 5286 lines.

One line is 14 bytes: 5286 x 14 = 74,004 bytes.

Odd... I would have expected something around 64kB.

sy2002 commented 4 years ago

Michael, it gets even more odd: Here is what I did:

  1. Used Ubuntu on my Mac in Parallels (just as I use it when using Vivado)

  2. Install picocom

  3. Use these command line options

    
    sudo picocom --b 115200 --f h --parit n --d 8 --sto 1 /dev/ttyUSB1
    picocom v2.2

port is : /dev/ttyUSB1 flowcontrol : RTS/CTS baudrate is : 115200 parity is : none databits are : 8 stopbits are : 1 escape is : C-a local echo is : no noinit is : no noreset is : no nolock is : no send_cmd is : sz -vv receive_cmd is : rz -vv -E imap is : omap is : emap is : crcrlf,delbs,

Type [C-a] [C-h] to see available commands

Terminal ready

QNICE-FPGA V1.7 [WIP] by sy2002 & MJoergen in September 2020 (GIT #c3aacce)

Simple QNICE-monitor - Version 1.7 (Bernd Ulmann, sy2002, September 2020)

QMON> HELP:


4. `cat cpu_test.out | xclip -selection clipboard`

5. `M` `L` and then PASTE `cpu_test.out`

6. `C` `R` `8000` 

7. **WORKS LIKE A CHARM**, I receive `OK`

8. I did a memory dump from`0x8000` to `0x9515`

I attached the memory dump here:
[dump_ubuntu_picocom.txt](https://github.com/sy2002/QNICE-FPGA/files/5217795/dump_ubuntu_picocom.txt)

Please check if the dump is OK.

**So result: Works for me**  😮 

Here is my Ubuntu version (running virtualized in Parallels under macOS 10.14.6:

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic



My picocom version is `picocom v2.2`.

What are your thoughts what this **ODD** result now means? I would bet on the FTDI drivers again ;-)
MJoergen commented 4 years ago

I've filed a bug report to Ubuntu and the picocom team: https://bugs.launchpad.net/ubuntu/+source/picocom/+bug/1895521

sy2002 commented 4 years ago

Well - did our comments overlap? Read my https://github.com/sy2002/QNICE-FPGA/issues/125#issuecomment-691965051 : picocom works for me...

sy2002 commented 4 years ago

But for putting my https://github.com/sy2002/QNICE-FPGA/issues/125#issuecomment-691965051 into perspective: I reasoned that it "works for me" due to the fact, that the CPU test returned OK. What I did not do: I did not use our toolset c/test_programs/memcrc.c and c/test_programs/filecrc.c yet to do a CRC-test of the data that was transferred, because I am in a hurry now and it would have meant recompiling the cpu_test.asm to start at 0xB000 because the CRC toolset only works with programs starting at 0xB000 or higher. You might want to double check the file dump_ubuntu_picocom.txt in my comment above, if you "feel that it looks good".

MJoergen commented 4 years ago

Ok, finally some good news: I downloaded the latest source code (version 3.1) for picocom from https://github.com/npat-efault/picocom and compiled it (without any changes), and this works like a charm! Even after repeated tests, there are no missing characters.

Since I've now found a work-around, I think we can close this issue.

Should we perhaps include a note in the README about which version of picocom to use ?

sy2002 commented 4 years ago

Indeed good news! Since I was using picocom v2.2 which is the same as you did - still strange, isn`t it?

The only differece was, that you seemed to use this command line parameters for V2.2 (according to your bug report):

picocom -f h -b 115200 /dev/ttyUSB1

And I was using this one:

sudo picocom --b 115200 --f h --parit n --d 8 --sto 1 /dev/ttyUSB1

So the difference was, that I specified 8-N-1 explicitly and you did it implicitly.

Should we perhaps include a note in the README about which version of picocom to use ?

Absolutely yes! :-) I consider this as super important.

Please do this:

sy2002 commented 4 years ago

I restructured doc/README.md as seen in issue #130 - so please add your documentation to this new structure and maybe also have one or more entries in the "index".

MJoergen commented 4 years ago

Done.