j-marjanovic / jtag-quartus-ft232h

GNU General Public License v3.0
44 stars 16 forks source link

Debugging failing libjtag loads in Quartus? #5

Open hedgeberg opened 11 months ago

hedgeberg commented 11 months ago

Hi!

I've been trying to do some work on one of the Storey Peak boards, and I tried to set up Quartus for it on an Arch Linux machine. After building and installing both the otma and dummy so's in the linux64 directory alongside the other libjtag .so files, I can't get quartus to acknowledge the existence of the OTMA adapter or the dummy adapter. I can't figure out any sane way to debug whether or not quartus is even scanning the list of .so files, though, so I don't really know how to even go about beginning to fix this. Do you have any tips for how one could get info out of quartus about what libjtag files it recognizes as existing?

j-marjanovic commented 11 months ago

Hi!

in this case it sounds easier to start with the dummy driver, it should print some diagnostic info through a socket (mentioned briefly at the end of the README).

When running /opt/intelFPGA/19.1/quartus/bin/jtagconfig --enum I get the following output on the console:

1) Dummy JTAG device [bus-instance]           
  029070DD   5SGSMD5H(1|2|3)/5SGSMD5K1/..

2) OTMA FT232H [bus-instance]
  Unable to lock chain - Communications error

and the following output on the socket (listening to it with nc -lkuU /var/tmp/jtag-dummy.sock):

SockDebug is ready
[jtag] UPDATE_IR = 3ff (BYPASS)
[jtag] UPDATE_IR = 3ff (BYPASS)
[jtag] UPDATE_IR = 3ff (BYPASS)
[jtag] UPDATE_IR = 3ff (BYPASS)

If this does not help, strace can be useful to see what the program is actually doing:

$ strace -f /opt/intelFPGA/19.1/quartus/bin/jtagconfig --enum 2>&1 | grep libjtag
openat(AT_FDCWD, "/opt/intelFPGA/19.1/quartus/linux64/glibc-hwcaps/x86-64-v3/libjtag_client.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/intelFPGA/19.1/quartus/linux64/glibc-hwcaps/x86-64-v2/libjtag_client.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/opt/intelFPGA/19.1/quartus/linux64/libjtag_client.so", O_RDONLY|O_CLOEXEC) = 3
[pid  9899] openat(AT_FDCWD, "/opt/intelFPGA/19.1/quartus/linux64/libjtag_hw_dummy.so", O_RDONLY|O_CLOEXEC) = 6
[pid  9899] openat(AT_FDCWD, "/opt/intelFPGA/19.1/quartus/linux64/libjtag_hw_pli-blaster.so", O_RDONLY|O_CLOEXEC) = 7
[pid  9899] openat(AT_FDCWD, "/opt/intelFPGA/19.1/quartus/linux64/libjtag_hw_otma.so", O_RDONLY|O_CLOEXEC) = 7
[pid  9899] openat(AT_FDCWD, "/opt/intelFPGA/19.1/quartus/linux64/libjtag_hw_pli-blaster.so", O_RDONLY|O_CLOEXEC) = 9

I have tested this with Quartus 19.1 and 20.1, are you using a different version of Quartus?

j-marjanovic commented 11 months ago

Quartus will also start a daemon (jtagd) when you open Programmer, so you might want to terminate that before running jtagconfig.

hedgeberg commented 11 months ago

(I'm going to end up sending multiple messages here, sorry. It's how I debug this sort of issue and record what I've done)

I haven't gotten any output of that sort yet. In terms of version, I'm on the newest version of quartus (Quartus Prime lite 23.3).

I tried using strace before submitting the original issue, and iirc I saw the libjtag files touched by jtagconfig, but not loaded. I'll repeat the experiment and post the results.

In order to try and track down where the libjtag files were being loaded, I ran grep -ri "libjtag" in the 23.3/quartus/linux64 directory as a way of trying to pin down the binary that would be responsible for locating the set of libjtag files. All the libjtag_hw*.so files showed up, but jtagd did too. I'm going to throw that into ghidra, see what conditions affect loading.

hedgeberg commented 11 months ago

Here's what I get as the output from running strace on jtagconfig:

$> strace jtagconfig --enum 2>&1 | grep "libjtag"                        
openat(AT_FDCWD, "/mnt/remote_tools/Altera/23.3/quartus/linux64/glibc-hwcaps/x86-64-v3/libjtag_client.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/mnt/remote_tools/Altera/23.3/quartus/linux64/glibc-hwcaps/x86-64-v2/libjtag_client.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/mnt/remote_tools/Altera/23.3/quartus/linux64/libjtag_client.so", O_RDONLY|O_CLOEXEC) = 3

So, jtagconfig is finding and opening libjtag_client.so, but I'm not seeing the child pid spawning that you are -- presumably that's jtagd getting spun up? So just to test, I did the same for jtagd:

$> strace jtagd --user-start --debug --foreground 2>&1 | grep "libjtag"           
openat(AT_FDCWD, "/mnt/remote_tools/Altera/23.3/quartus/linux64/libjtag_hw_dummy.so", O_RDONLY|O_CLOEXEC) = 6
openat(AT_FDCWD, "/mnt/remote_tools/Altera/23.3/quartus/linux64/libjtag_hw_otma.so", O_RDONLY|O_CLOEXEC) = 6
openat(AT_FDCWD, "/mnt/remote_tools/Altera/23.3/quartus/linux64/libjtag_hw_pli-blaster.so", O_RDONLY|O_CLOEXEC) = 6

So, jtagd is definitely discovering the files, but I'm seeing nothing from jtagconfig at all.

I should also note for some added context that I'm running my FPGA dev tools from an NFS share mounted readonly @ /mnt/remote_tools/. I don't think that would make much difference, but it's possible that Quartus isn't designed for that usage mode the same way that Vivado is (although thus far it seems to be handling it fine?)

hedgeberg commented 11 months ago

I'm going to go forward with the plan of throwing jtagd into ghidra, see what I can figure out as far as how exactly jtagd decides which files get pulled in and which don't, or if I can figure out how to make it spit out better debugging info than what it gives me rn, which is:

$> jtagd --user-start --debug --foreground
JTAG daemon started (will stop when idle)
Using config file /home/hedge/.jtagd.conf
No remote JTAG because stops when idle

Also, since I forgot to mention it before now, here's the output from jtagconfig --enum, it doesn't change whether jtagd has been run prior or not.

$> jtagconfig --enum
No JTAG hardware available
hedgeberg commented 11 months ago

Apologies, it's 22.1, don't know how I misnamed that so bad.

hedgeberg commented 11 months ago

Ok, throwing jtagd into ghidra, I found where the libjtag_hw_*.so files are being loaded. It's in the aptly-named function "load_libraries", pictured below.

image

Seems like I can use GDB to see whether this is being loaded correctly by setting a breakpoint on the function "create_hardware_type" (pictured below), then check to see if the library is getting successfully through dlopen and dlsym calls are being made without anything breaking. I'll keep going down this route for now, unless you have a better idea in terms of how to pin down this issue.

image

hedgeberg commented 11 months ago

So, the libjtag_hw_*.so files are being loaded successfully into memory. All that "create_hardware_types" is doing is running the .so through a series of checks and then wrapping it with a C++ struct and dropping the struct into a std::vector object pointed to by the symbol "m_factories". "m_factories" is only referenced by a few functions, so somewhere there's going to be some overlap between one of those functions and whatever is called externally when jtagconfig --enum is invoked. Still have a bunch more RE (and it's C++ RE to boot, fun) but this does appear to be going somewhere. Let me know if any of this seems silly or like I'm barking up the wrong tree.

j-marjanovic commented 11 months ago

Sorry, been quite busy in the last couple of days - that is indeed very interesting, I am curious what you will discover next.

hedgeberg commented 11 months ago

I've also been extremely busy w/ work. Progress on this is going to be sporadic, unfortunately.

Good news is, stumbled upon something weird -- if I start jtagd manually in a separate terminal in the foreground, the libraries load? I get enum to output the dummy adapter. It even tries to note the OTMA adapter but then jtagd dies (the card isn't even plugged in at that point so that's not terribly surprising but may be worth looking into, as halting jtagd on error seems potentially liable to break things). Looks like the following:

$> jtagconfig --enum                                                                               
1) Dummy JTAG device [bus-instance]
  029070DD   5SGSMD5H(1|2|3)/5SGSMD5K1/..

2) OTMA FT232H [bus-instance]
  Unable to lock chain - Communications error

As for what's causing this difference in behavior re: starting jtagd standalone before running jtagconfig --enum vs starting it via jtagconfig --enum on its own, I have no idea. Seems worth continued investigation. For now, dummy adapter appears to enumerate, I'll test if the stratix V chip enumerates, see where that gets us. If it works, then that means that, at the very least, the library itself as built is functional, the issue is instead something to do with how jtagd is launched.

hedgeberg commented 11 months ago

...figured it out, and it's so much dumber than I expected. On a hunch, I went ahead and tried jtagconfig --enum twice, once with jtagd launched prior, and once with letting jtagconfig launch jtagd. Then, after each, I went ahead and cat'ed their /proc/{pid}/environ values and looked for differences.

I had been running into issues with Quartus breaking system library dependencies when I ran qenv.sh, so I had been manually prepending /usr/lib to my LD_LIBRARY_PATH environment var afterwards to ensure that system libraries were loaded when possible. When launching jtagd on its own, the environ data I dumped out shows LD_LIBRARY_PATH=/LD_LIBRARY_PATH=/usr/lib:/mnt/remote_tools/Altera/22.1/quartus/linux64/. When letting jtagconfig spin up jtagd on its own, environ data shows LD_LIBRARY_PATH=/mnt/remote_tools/Altera/22.1/quartus/linux64:/usr/lib:/mnt/remote_tools/Altera/22.1/quartus/linux64/. So, something in quartus's stupid environment setup re-prepends the linux64 directory. To validate this was in fact my issue, I went ahead and did:

$> echo $LD_LIBRARY_PATH 
/usr/lib /mnt/remote_tools/Altera/22.1/quartus/linux64/

$> set -x LD_LIBRARY_PATH /mnt/remote_tools/Altera/22.1/quartus/linux64 $LD_LIBRARY_PATH                                                                                                                                   

$> echo $LD_LIBRARY_PATH                                                                                                                                                                                                   
/mnt/remote_tools/Altera/22.1/quartus/linux64 /usr/lib /mnt/remote_tools/Altera/22.1/quartus/linux64/

$> jtagd                                                                                                                                                                                                                   

$> jtagconfig --enum                                                                                                                                                                                                       
No JTAG hardware available

$> killall -9 jtagd

$> set -x LD_LIBRARY_PATH /usr/lib /mnt/remote_tools/Altera/22.1/quartus/linux64

$> jtagd

$> jtagconfig --enum                                                                                                                                                                                                       
1) Dummy JTAG device [bus-instance]
  029070DD   5SGSMD5H(1|2|3)/5SGSMD5K1/..

2) OTMA FT232H [bus-instance]
  Unable to lock chain - Communications error

So, there we go. The source of the error is (somehow, presumably thanks to the wonderful magic of dyld) Quartus' overzealous attempts to place itself at the start of the user's LD_LIBRARY_PATH variable. Now I guess I "just" need to see if there's a way to make it stop doing this. I'll leave this issue open for a bit while I troubleshoot, but if this continues to be an issue I'll solve this by just writing a script to separately launch jtagd, post the script, and close the issue.