Closed MJoergen closed 4 years ago
This is an interessting one.
First of all, I am glad that it works stable with qtransfer now and that I added qtransfer to the Monitor. This gives as one very reliable way to transfer the data.
About the challange itself:
I just tried to reproduce this: It works like a charm. I tried multiple times.
Just for being on the save side: I attached my bitstream to this comment. You might want to try it. It is based in Github commit #85e2d29. See below.
I even moved the start address of cpu_test.asm to 0xB000
so that I was able to CRC check it with c/test_programs/memcrc.c
. memcrc is able to calculate a CRC16 on any memory area while running on QNICE, it itself starts at 0x8000 and tinkers with addresses in the 0xA000 space, so you need to change the .org
of the file under test to 0xB000
to use it, so that memcrc and the file under test will not collide. The counterpart is c/test_programs/filecrc.c
which is meant to run on the host and which spits out the CRC value "as if it was on QNICE". So you can use these two apps in your tests: Calculate the value of the .out
file on the host and then recalculate it on QNICE. This is more reliable than looking at memory areas that contains 0x0000
s .
The biggest question is:
Why does it work like a charm on my side?
Do I have newer/better FTDI drivers on my host machine? As you know, the Nexys4DDR uses an FTDI chip for emulating serial communication over USB. Maybe you can try an update/upgrade.
Now three questions:
Could you please retry using my bitstream (just being paranoid ;-)) env1.bit.zip
Did you swithch on RTS/CTS in your terminal program? QNICE is able to halt the transfer using RTS/CTS if the FIFO buffer overruns. If yes: Can you try another / new terminal program than the one you are using? I would prefer one with a GUI, so that when you switch on RTS/CTS without any danger of getting some command line argument wrong (again: just being paranoid ;-) )
Could you please try to increase the FIFO buffer forbus_uart.vhd
specified in env1_gloals
in UART_FIFO_SIZE
(currently 32 bytes)?
If all of this is not working, we you might want to think about refactoring the UART module. I would start with my wrapper from 2015 bus_uart.vhd
:-)
I tried triggering on the signal bus_uart/fifo_full
and that signal was never asserted. This is to be expected since the baudrate is only 115200 bits/s and with 10 bits per character this gives 11520 characters per second. With a CPU clock frequency of 50 MHz this gives 4340 clock cycles for each character, and the CPU is sitting in a very small busy loop doing nothing but servicing the UART. So all this to say that the Rx UART FIFO in the FPGA is never containing more than a single character.
I searched for any errors when reading from the Rx UART FIFO (e.g. any race condition between the CPU reading from the FIFO and the UART writing to the FIFO), but found none.
It seems that the data received into the FPGA is erroneous, i.e. that already on the input to the FPGA some data is missing.
My guess is that the USB-UART chip (FTDI FT2232HQ) on the Nexys4DDR is corrupting the data, maybe because of a FIFO overrun. That chip has an onboard Rx FIFO of 4096 characters, and we are certainly sending much larger amounts of data than that.
I will try with a different terminal program. What program are you using on the MAC ?
I love riddles :-) And this one is really a tough one.
Here are my questions/thoughts:
1)
On my side it works super stable: Same FPGA board than you have. So what are the differences between your experiment and mine? We might want to think about that a bit deeper. As soon as we find the right difference between your and my setup, we might find the root cause. So here are the differences I can think of:
2)
It seems that the data received into the FPGA is erroneous, i.e. that already on the input to the FPGA some data is missing.
What about the FTDI driver? Did you try multiple machines on your side with multiple FTDI drivers? (Why do I think that this is relevant: The fact that the communication between your Vivado and the board is stable and works fine in general, for example when programming the FPGA or when using ILA and stable lets me assume that there is no hardware bug in the chip on the Nexys. But I can think of a software bug in the FTDI drivers: Let's not forget: There is no serial port at all on the host side: this is just an emulation over USB...)
My guess is that the USB-UART chip (FTDI FT2232HQ) on the Nexys4DDR is corrupting the data, maybe because of a FIFO overrun. That chip has an onboard Rx FIFO of 4096 characters, and we are certainly sending much larger amounts of data than that.
This would point to a driver issue, beause we talk about a simulated serial over USB thing.
3)
So all this to say that the Rx UART FIFO in the FPGA is never containing more than a single character.
Just a tought about this statement: On MEGA65, when I switchted off the UART, I had a super unstable behaviour, similar to what you describe in this issue. This was the reason at all, why I added an UART: It happend during the porting of I QNICE-FPGA to MEGA65. I guess we must not underestimate the delay/latencies/effects, that the FTDI chip is bringing to the game
4)
One more thing - pretty absurd, but I think it makes sense to share it with you: Back in the days, when I did not know yet, what a "timing closure" is and when I developed the EAE, I missed out to do something like this (back in the days I was using ISE):
INST "eae_inst/op*" TNM="EAE_OPS";
INST "eae_inst/res*" TNM="EAE_RES";
TIMESPEC TS_EAE=FROM "EAE_OPS" TO "EAE_RES" 30 ns;
The result: Completely unstable UART and keyboard... No idea why. You can read my old commit comment from 2016 - searched and found it :-) Here it is: https://github.com/sy2002/QNICE-FPGA/commit/d47523b156c554a17b0b4e00ba4d6a4156aee0c4
UPDATE: I added point 4)
to above-mentioned message.
Sorry for spamming you - one more addition to my above-mentioned point 2)
(please read my lengthy comment https://github.com/sy2002/QNICE-FPGA/issues/125#issuecomment-690994600 first):
Just had a look: The terminal program I am using comes with its own FTDI drivers:
Here, the maker of my terminal program "Serial" states that it indeed directly supports the FTDI FT2232: https://www.decisivetactics.com/support/view?article=compatible-devices
So more and more this "smells" like a driver problem.
So more and more this "smells" like a driver problem.
I agree. I'll check when I have the opportunity. This document - https://www.ftdichip.com/Support/Documents/AppNotes/AN_220_FTDI_Drivers_Installation_Guide_for_Linux.pdf - has some information about the FTDI drivers.
I tried increasing the baud rate to 1.5 Mbaud in the file env1_globals.vhd
and when starting the terminal program (I use picocom
). However, even though the 'H' (HELP) command worked as expected, using the M-L command sequence and pasting the cpu_test.out
file lead to severe data corruption, much worse than before.
While debugging, I've noticed something peculiar. When running with the high baudrate of 1.5 Mbaud and examining the waveform in ILA, I notice that the first 20 (or so) characters received are correct, and yet there are characters missing in the echo back to the user. The screen shot below shows the problem:
Due to the high speed two characters were received into the FIFO (0x30 and 0x78) before the CPU began reading. However, the CPU read lasts two clock cycles and therefore "pops" both characters from the FIFO, but only the first character (0x30) is given to the CPU. The second character (0x78) is lost.
This behaviour is much worse with the high baudrate, but can not be completely ruled out with lower baudrates.
This behaviour is more generally tied together with the Issue #55, but a work-around for the UART module can easily be implemented.
I changed the UART module to not do a double-read, but the issue still remains.
Then I noticed that the CPU is using about 750 clock cycles for each character received, and that most of that time was spent in the routine STR$STRCHR
. I optimized that routine so the CPU now only uses about 500 clock cycles for each character, but still the issue remains.
I see that the data output from the UART FIFO is corrupted, i.e. missing data. I then notice that the FIFO full signal is asserted, and this signal is connected to the CTS output. However, even though this signal is asserted the sender (FTDI chip) still sends a few more characters, and this causes FIFO overrun. So I changed the CTS signal to be asserted when the free space in the UART FIFO is less than 5 characters. This now correctly pauses transmission, and the UART FIFO never becomes full. But still the issue remains.
Next thing to try is to decrease the size of the FIFO to see whether the data INTO the FIFO is corrupted or not.
@MJoergen About this:
I then notice that the FIFO full signal is asserted, and this signal is connected to the CTS output. However, even though this signal is asserted the sender (FTDI chip) still sends a few more characters, and this causes FIFO overrun.
IMPORANT: It is not the FTDI chip which works with the CTS signal. The FTDI chip is just a serial to USB converter which encapsulates the serial signals, transmits them via the USB protocol and the driver on the host side is unpacking everything again. It is just a simulation. There is no serial port...
So it is your terminal program, which needs to handle CTS.
If your terminal program is not handling CTS correctly, then this causes problems:
serial pins on FPGA => FTDI-CHIP => USB => FTDI-Driver => Terminal Program
And the part
FTDI-CHIP => USB => FTDI-Driver
should be transparent, so that for the serial pins on the FPGA it "feels" as if they are directly connected to the terminal program.
Could you rule out a FTDI driver problem?
I need to re-iterate that on my setup with the 100% identical board, it works the same, this is why it might be interessting to find out, what is different between our setup (see https://github.com/sy2002/QNICE-FPGA/issues/125#issuecomment-690994600)
Ok, with a smaller FIFO I was able to examine the data going INTO the UART FIFO:
I can see that the data received by the FPGA is corrupted. In particular, by comparing the characters read into the UART FIFO with the contents of the cpu_test.out
file I see that a total of 118 characters are MISSING from the input stream. This observation seems to acquit the FPGA completely. The deletion happens after having received 72k correct characters.
I'm using the picocom program with the switch -f h
and this is recognized by the program as selecting hardware flow control because it responds with
*** flow: RTS/CTS
So, what can we do to make this work? I'll look into how to change the FTDI driver as the next step.
Wow - this is really crazy stuff! 😧
I'll look into how to change the FTDI driver as the next step.
As far as I know, you're having multiple machines: Was it reproducible on all your machines?
This is getting more and more mysterious. I just learned about the USB monitoring program on Linux called usbmon
. With that I can dump all the USB traffic from the host to the Nexys4DDR card. Using this program I can see that already there the traffic is corrupted. In other words, the USB traffic sent from the host is missing some characters.
I even tried a different USB port on the host, and still characters are missing.
So next thing is to install the FTDI drivers on my laptop and try there.
Using this program I can see that already there the traffic is corrupted. In other words, the USB traffic sent from the host is missing some characters.
WOW! This is one of the more interessting riddles I saw in the last years... 😮
But it supports the hypothesis that our hardware works. On the other hand, some questions that arise are: Why can Vivado then communicate so nicely and without errors and why can qtransfer work so stable? (The answer might be, that both use smaller packet sizes: qtransfer for example only roughly ~400 bytes per package.)
You could indeed try to install the FTDI drivers on your laptop and try there.
You could also try to remove the old version of the FTDI driver on your other machine and replace it.
Just retried on my work-computer running Ubuntu 18.04.5 LTS
and the observed problem is the same.
So in this setup I'm using HEAD on the develop branch, i.e. going back to 115 kbit/s, and only about 60 % of the text is transferred over the USB interface. The transmitted text stops in the middle of a line, i.e. the last few characters are 0x8AF
. I'm capturing the USB traffic using the usbmon
utility. I can see that the last characters echoed back on the terminal is indeed 0x8AF
, and a memory dump confirms that addresses after 0x8AF5
contain no data, so I'm confident that the captured traffic matches what is sent on the USB interface.
I've checked that after the transfer of the text file, the USB interface correctly transmits the Ctrl-E
key-press.
I've repeated the test a number of times, and although the amount of text transferred is not identical, it is all in the same rough size, i.e. 0x8BA5
was seen as the last address transferred in one of the test runs.
Just being paranoid, I've verified that copy-pasting to a regular file works: I can see that all addresses up to 0x9515
get pasted correctly. So the text is indeed present in the clip-board buffer.
I'm stumped. I guess the next step could be to check whether the serial program picocom
is indeed writing the entire text to the serial port (using tools like socat
and/or strace
).
OMG - can you try another terminal program? Maybe cu
? Maybe screen
? Maybe tip
?
Using strace
I can confirm that the program picocom
DOES read the entire program text from stdin (file descriptor 0) - up to 0x9515
read(0, "0", 1) = 1
write(1, "\7", 1^G) = 1
select(4, [0 3], [3], NULL, NULL) = 1 (in [0])
read(0, "x", 1) = 1
write(1, "\7", 1^G) = 1
select(4, [0 3], [3], NULL, NULL) = 1 (in [0])
read(0, "9", 1) = 1
write(1, "\7", 1^G) = 1
select(4, [0 3], [3], NULL, NULL) = 1 (in [0])
read(0, "5", 1) = 1
write(1, "\7", 1^G) = 1
select(4, [0 3], [3], NULL, NULL) = 1 (in [0])
read(0, "1", 1) = 1
write(1, "\7", 1^G) = 1
select(4, [0 3], [3], NULL, NULL) = 1 (in [0])
read(0, "5", 1) = 1
but DOES NOT write the entire text onto the serial port (file descriptor 3) - only up to 0x94A5
. The last two writes to file descriptor 3 are:
write(3, "0x9481 0x000E\r0x9482 0x00A0\r0x94"..., 509) = 509
write(3, "\5", 1)
The second line corresponds to Ctrl-E
and the first line transfers 509 characters, which is 509/14=36 addresses, i.e. up to 0x9481
+0x0024
= 0x94A5
.
Time to find a different terminal program ...
but DOES NOT write the entire text onto the serial port (file descriptor 3) - only up to 0x94A5.
What a bummer!
0x94A5 minus 0x8000 = the very odd amount of 5285 + 1 = 5286 lines.
One line is 14 bytes: 5286 x 14 = 74,004 bytes.
Odd... I would have expected something around 64kB.
Michael, it gets even more odd: Here is what I did:
Used Ubuntu on my Mac in Parallels (just as I use it when using Vivado)
Install picocom
Use these command line options
sudo picocom --b 115200 --f h --parit n --d 8 --sto 1 /dev/ttyUSB1
picocom v2.2
port is : /dev/ttyUSB1 flowcontrol : RTS/CTS baudrate is : 115200 parity is : none databits are : 8 stopbits are : 1 escape is : C-a local echo is : no noinit is : no noreset is : no nolock is : no send_cmd is : sz -vv receive_cmd is : rz -vv -E imap is : omap is : emap is : crcrlf,delbs,
Type [C-a] [C-h] to see available commands
Terminal ready
QNICE-FPGA V1.7 [WIP] by sy2002 & MJoergen in September 2020 (GIT #c3aacce)
QMON> HELP:
4. `cat cpu_test.out | xclip -selection clipboard`
5. `M` `L` and then PASTE `cpu_test.out`
6. `C` `R` `8000`
7. **WORKS LIKE A CHARM**, I receive `OK`
8. I did a memory dump from`0x8000` to `0x9515`
I attached the memory dump here:
[dump_ubuntu_picocom.txt](https://github.com/sy2002/QNICE-FPGA/files/5217795/dump_ubuntu_picocom.txt)
Please check if the dump is OK.
**So result: Works for me** 😮
Here is my Ubuntu version (running virtualized in Parallels under macOS 10.14.6:
No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic
My picocom version is `picocom v2.2`.
What are your thoughts what this **ODD** result now means? I would bet on the FTDI drivers again ;-)
I've filed a bug report to Ubuntu and the picocom
team: https://bugs.launchpad.net/ubuntu/+source/picocom/+bug/1895521
Well - did our comments overlap? Read my https://github.com/sy2002/QNICE-FPGA/issues/125#issuecomment-691965051 : picocom works for me...
But for putting my https://github.com/sy2002/QNICE-FPGA/issues/125#issuecomment-691965051 into perspective: I reasoned that it "works for me" due to the fact, that the CPU test returned OK
. What I did not do: I did not use our toolset c/test_programs/memcrc.c
and c/test_programs/filecrc.c
yet to do a CRC-test of the data that was transferred, because I am in a hurry now and it would have meant recompiling the cpu_test.asm
to start at 0xB000
because the CRC toolset only works with programs starting at 0xB000
or higher. You might want to double check the file dump_ubuntu_picocom.txt
in my comment above, if you "feel that it looks good".
Ok, finally some good news: I downloaded the latest source code (version 3.1) for picocom from https://github.com/npat-efault/picocom and compiled it (without any changes), and this works like a charm! Even after repeated tests, there are no missing characters.
Since I've now found a work-around, I think we can close this issue.
Should we perhaps include a note in the README about which version of picocom to use ?
Indeed good news! Since I was using picocom v2.2
which is the same as you did - still strange, isn`t it?
The only differece was, that you seemed to use this command line parameters for V2.2 (according to your bug report):
picocom -f h -b 115200 /dev/ttyUSB1
And I was using this one:
sudo picocom --b 115200 --f h --parit n --d 8 --sto 1 /dev/ttyUSB1
So the difference was, that I specified 8-N-1 explicitly and you did it implicitly.
Should we perhaps include a note in the README about which version of picocom to use ?
Absolutely yes! :-) I consider this as super important.
Please do this:
constraints.txt
into a .md
file, format it and add a section for serial terminal programs for Mac (I will fill it; assign this issue after you did the rest to me) and Linux. Then write about your experience with the lower version so that if somebody has symptoms, he would find himself in the constraints.txt.I restructured doc/README.md as seen in issue #130 - so please add your documentation to this new structure and maybe also have one or more entries in the "index".
Done.
Steps to reproduce:
cpu_test.out
).You'll notice large sections of memory containing 0x0000's, which are not present in the original file.
Some more debugging using the ILA shows that the FPGA is not receiving the entire file over the UART connection. So it looks like a buffer overflow in the USB-to-UART chip on the Nexys4DDR board.
I've verified that the same file (
cpu_test.out
) can correctly be transferred to the Nexys4DDR using theqtransfer
program, using the same UART connection.