trabucayre / openFPGALoader

Universal utility for programming FPGA
https://trabucayre.github.io/openFPGALoader/
Apache License 2.0
1.22k stars 261 forks source link

Possible hang in FT2232H #62

Open GbGp opened 3 years ago

GbGp commented 3 years ago

Hello, today I observed a possible hang that might occur when the FT2232H is left in a certain state by another software. The problem can be reproduce with the following: 1) program a fpga using the interface B of the FT2232H 2) run another software that uses the interface A of the FT2232H in BITMODE_SYNCFF mode. 3) trying to reprogram the FPGA again, openFPGAloader hang even before the progress bar appears.

Turns out the other software wasn't resetting the interface A by setting BITMODE_RESET before exiting; As described in the manual, the FT2232H, and possibly other similar multichannel devices, can lose access to one interface when programmed in certain modes such as SYNCFF.

This is probably a very specific corner case, but I find surprising that libftdi doesn't generate any error when opening, configuring and writing/reading an interface that is not usable in that particular configuration. Anyway I think it might be good idea to extend the reset procedure in openFPGALoader, by setting BITMODE_SYNCFF on all interfaces on initialization.

trabucayre commented 3 years ago

Thanks to point this issue. I need to find a way to detect this to warn user instead of having a crash. I'm not really convinced to open all interfaces to modify configuration. openFPGALoader has to deal with the interface provided by the user or the configuration and only this one. In some situation forcing second interface may be a problem. If a software modify ftdi's configuration it must revert to default at the end (openFPGALoader do that).

GbGp commented 3 years ago

ok, I don't know if this can be useful but I think the hang happen at the first ftdi read:

(gdb) where
#0  0x00007ffff7cafaff in __GI___poll (fds=0x555555760bc0, nfds=3, timeout=60000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ffff7fa240d in ?? () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#2  0x00007ffff7fa365c in libusb_handle_events_timeout_completed () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#3  0x00007ffff7fa3704 in libusb_handle_events_completed () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#4  0x00007ffff7fa4038 in ?? () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#5  0x00007ffff7fa414b in ?? () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#6  0x00007ffff7fa4463 in libusb_bulk_transfer () from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#7  0x00007ffff7f8dcd4 in ftdi_read_data () from /lib/x86_64-linux-gnu/libftdi1.so.2
#8  0x0000555555671731 in FTDIpp_MPSSE::mpsse_read(unsigned char*, int) ()
#9  0x000055555566fcd5 in FtdiJtagMPSSE::writeTDI(unsigned char*, unsigned char*, unsigned int, bool) ()
#10 0x0000555555668b54 in Jtag::read_write(unsigned char*, unsigned char*, int, char) ()
#11 0x0000555555668800 in Jtag::detectChain(std::vector<int, std::allocator<int> >&, int) ()
#12 0x000055555567cd8b in main ()

Anyway there is an error in my original comment, I was suggesting to set BITMODE_RESET on all interfaces, not BITMODE_SYNCFF.

trabucayre commented 3 years ago

Thanks for this dump. I will trying to reproduce this to see if it's possible to add some check and/or catching.

trabucayre commented 3 years ago

I'm able to reproduce this issue. When using interfaceB and with interfaceA is configured as BITMODE_SYNCFF, ftdi_read_data return always 0 but without errors. Since I'm not really happy to let openFPGALoader overriding something in the other interface I search for a work around to inform user about a potential wrong FTDI configuration

When a read is done with a size > 0, If the return == 0 and if used interface is B, I think it's possible to deduce this type of issue, to print a message and to stop the program cleanly. What is your opinion

GbGp commented 3 years ago

I don't have anything against it. In this particular case it would be handy if it was possible to read back the BITMODE setting (for some reason libftdi does't support this, but the proprietary d2xx driver does with FT_GetBitMode), so for now this might be the only option.

RGD2 commented 2 years ago

I think make that 'possible' a 'confirmed' hang? May not be the same issue, exactly: For me, openFPGALoader was getting stuck perpetually right after showing 'erase SRAM done'. Pulling the ft2232 resulted in a spew of mpsse_write/store errors, and it just kept going without crashing out until ctrl-C'd or killed.

My problem involved having previously tried using usb/ip, and having had that set up to host the same ft2232h as trying just run locally. (BTW, was using usbip to get vendor tool to work remotely, did work but wasn't reliable or fast, openFPGALoader running on the embedded SBC was a lot faster).

uspip would mess up with windows somehow hanging onto the port and refusing to disconnect it, even after multiple replugs at the host end. (probably a bug with GH://barbalion/usbip-win-client , which I was using at the windows end). Rebooting the pi with openFPGALoader on it didn't fix it, but rebooting the windows machine still trying to claim the device, whilst having stopped usbipd on the pi, solved it.

USB/ip is a desparate dodge, running openFPGALoader on a local small embedded sbc is The Way to deal with needing to program an FPGA over a long ethernet cable. SBC's are pretty cheap.

PS: I can also report that openFPGALoader works with the Gowin official dk-start-gw1n4 v1.1 board, just using option '-c bus_blaster', which should be as expected, since it's just another FTx232H being used in JTAG mode on the first four pins.

Somewhat annoyingly, they didn't connect the second interface at all - so they may as well just have used a FT232H. I think perhaps this is part of the problem - wierd interaction since windows sees both interfaces over usb/ip and 'claims' both, and this makes a mess somehow.

Anyway, not sure whether this is really related, but it does seem that there is some bad behavior with ft2232's (typical FTDI schenanigans).

trabucayre commented 2 years ago

I have to redo all my tests... I suspect hang is due to a libftdi's internal function: ftdi_write_data loop until requested number of bytes have been sent or fails if libusb_bulk_transfer returns <0 but when 0 bytes are sent the loop never stop.

Maybe I'm wrong and I have to recheck to confirm/infirm my assumption.

For network programming I works on a client side XVC implementation (see #210) it may be help to have this behavior. (it's possible to use netcat too)

RGD2 commented 2 years ago

I rebuild with cmake -DCMAKE_BUILD_TYPE=RelWithDegInfo ..

openFPGALoader --detect returns:

Jtag frequency : requested 6.00MHz   -> real 6.00MHz
index 0:
        idcode 0x100381b
        manufacturer Gowin
        family GW1N
        model  GW1N-4
        irlength 8

Then ran with gdb like so:

sudo gdb --args openFPGALoader -mc bus_blaster bitfile.fs

This loads into gdb, and I run with with r, then it does the thing where it hangs right before showing the progress bar...

Jtag frequency : requested 6.00MHz   -> real 6.00MHz
Parse file Parse bitfile.fs:
Done
DONE
Jtag frequency : requested 2.50MHz   -> real 2.00MHz
erase SRAM Done

Ctrl-C then gives me this in gdb:

^C
Thread 1 "openFPGALoader" received signal SIGINT, Interrupt.
__GI___poll (timeout=60000, nfds=3, fds=0x35ee38) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.

I haven't used GDB very much, so here's what the registers and stack look like - but it seems like it's caught in an infinite loop waiting forever there - it doesn't stop given a minute.

(gdb) i r
r0             0x1                 1
r1             0x3                 3
r2             0xea60              60000
r3             0x0                 0
r4             0x35ee38            3534392
r5             0x76ff5500          1996444928
r6             0xea60              60000
r7             0xa8                168
r8             0x3                 3
r9             0x0                 0
r10            0x7effe5b4          2130699700
r11            0x7effe774          2130700148
r12            0x0                 0
sp             0x7effe530          0x7effe530
lr             0x0                 0
pc             0x76c1d738          0x76c1d738 <__GI___poll+96>
cpsr           0x80000010          -2147483632
fpscr          0x80000010          -2147483632
(gdb) i f
Stack level 0, frame at 0x7effe548:
 pc = 0x76c1d738 in __GI___poll (../sysdeps/unix/sysv/linux/poll.c:29); saved pc = 0x76f85c98
 inlined into frame 1
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp in sp

Is this any help?

This is super frustrating, as I had this invokation working for my board & FPGA just fine previously... actually better than fine - totally excellent and also much faster than the vendor utility.

I am not sure why it's broken now.

RGD2 commented 2 years ago

..... I just had an odd experiance. I copied that 99-etc file into rules.d and rebooted, but no change. Then I thought, well, how about the external eeprom? That failed too, but then...

pi@usbpi:~ $ openFPGALoader bitfile.fs
Jtag frequency : requested 6.00MHz   -> real 6.00MHz
Parse file Parse bitfile.fs:
Done
DONE
Jtag frequency : requested 2.50MHz   -> real 2.00MHz
erase SRAM Done
^C
pi@usbpi:~ $ openFPGALoader -f bitfile.fs
write to flash
Jtag frequency : requested 6.00MHz   -> real 6.00MHz
Parse file Parse bitfile.fs:
Done
DONE
Jtag frequency : requested 2.50MHz   -> real 2.00MHz
erase SRAM Done
erase Flash Done
write Flash: [==================================================] 100.00%
Done
CRC check : FAIL
f956 0000
pi@usbpi:~ $ openFPGALoader bitfile.fs
Jtag frequency : requested 6.00MHz   -> real 6.00MHz
Parse file Parse bitfile.fs:
Done
DONE
Jtag frequency : requested 2.50MHz   -> real 2.00MHz
erase SRAM Done
Flash SRAM: [==================================================] 100.00%
Done
SRAM Flash: Success

I'm not sure whether this is a safe workaround, but it is A workaround.

It seems to be that it does find and write to some EEPROM, but then fails the CRC checksum. This howerver seems to configure the FTDI chip / library into a good state, which then works - and for repeated reconfigurations, as fast as I can tell, at least until next power cycle, and then it's broken again.

And no, the external flash write really didn't work, the FPGA isn't booting from it. The official tools don't seem to reliably write that eeprom either - so could just be that particular chip, They write it by first putting a special FPGA image onto the chip, and then use that image to write the eeprom, it seems.

Well, I hope this workaround is useful to someone.

trabucayre commented 2 years ago

the CRC is the fs one. When the bitstream is loaded checksum is available through one JTAG register. If it's equal to 0 this mean loading has failed (ie corrupted memory image). For gdb could you provides the long bt to see where in openFPGALoader program hang? Thanks

RGD2 commented 2 years ago

Here's what bt shows, or rather doesn't:

pi@usbpi:~ $ gdb --args openFPGALoader bitfile.fs
GNU gdb (Raspbian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from openFPGALoader...
(No debugging symbols found in openFPGALoader)
(gdb) r
Starting program: /usr/local/bin/openFPGALoader bitfile.fs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0x76b48440 (LWP 22527)]
Jtag frequency : requested 6.00MHz   -> real 6.00MHz
Parse file Parse bitfile.fs:
Done
DONE
Jtag frequency : requested 2.50MHz   -> real 2.00MHz
erase SRAM Done
^Z
Thread 1 "openFPGALoader" received signal SIGTSTP, Stopped (user).
__GI___poll (timeout=60000, nfds=3, fds=0x3b40b8) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) bt
#0  __GI___poll (timeout=60000, nfds=3, fds=0x3b40b8) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  __GI___poll (fds=0x3b40b8, nfds=3, timeout=60000) at ../sysdeps/unix/sysv/linux/poll.c:26
#2  0x76f85c98 in ?? () from /lib/arm-linux-gnueabihf/libusb-1.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Let me know if there's anything else I can get you.

trabucayre commented 2 years ago

You use a raspberry? Could you try with a computer? I have already seen issues when use some cable on rpi -> jtag interface for tang nano 4k & 9k are unstable with this platform.

RGD2 commented 1 year ago

Not sure if relevant, but I've noticed I sometimes see device code 1000381b, which ... doesn't exist in the Gowin documentation... I have a mix of GW1N-4 (which is code 0100381b) and GW1N-4B (1100381b). So I am not sure how I am getting this:

$ openFPGALoader --detect
Jtag frequency : requested 6.00MHz   -> real 6.00MHz
index 0:
        idcode 0x100381b
        manufacturer Gowin
        family GW1N
        model  GW1N-4
        irlength 8
$

I'm pretty sure that should be a plan GW1N-4 (non-B), which is supposed to be h0100381B, at least according to Table 6-6 of 'Gowin FPGA Products Programming and Configuration Guide. And it is, according to their own programmer software. It could just be they have an error in their documentation, I suppose, which they've 'fixed' by changing their programmer util to make it 'look right'.

But it seems odd.

Anyway, not sure if relevant, but at the moment I'm resorting to programming it over long active USB extension lead.

The 'pi being the programmer' is mostly because I can't have a PC where the board is currently employed: It's an SBC, maybe a USB-network extender (Although they are quite dodgy) or just a long, awkward lead. The 'pi is doing other things there anyway, so it's just very convenient if I can just use a local PC script to copy the file, and call openFPGALoader over ssh to update the board.

Flinner commented 6 months ago

I figured the issue causing hangs, I had conflicting udev rules!!! So much time wasted :((( thank God I found it