myriadrf / LimeSuite

Driver and GUI for LMS7002M-based SDR platforms
https://myriadrf.org/projects/lime-suite/
Apache License 2.0
470 stars 186 forks source link

Repeating Gateware version mismatch problem #287

Closed SignalWhisperer closed 4 years ago

SignalWhisperer commented 4 years ago

I'm reporting this because I finally found the fix for this issue: https://discourse.myriadrf.org/t/repeating-gateware-version-mismatch-problem/1339

After a few hours of debugging and reading code, I found the culprit and the solution for this long standing issue that still stands today. I'm still working on the patch which I will submit as a pull request shortly.

Here is the problem some of us face: every other time you open the LimeSDR device, it will fail to obtain the FPGA information (Gateware version) and will say it has version 0. It's systematic on my Linux machine, every second invocation, yet on Windows I don't seem to have this issue.

The culprit is a bug in libusb for which I have not yet found a patch, where if a call to libusb_bulk_transfer() times out, the value of transferred bytes is set to the length requested, and the buffer is filled with bytes from the memory. (see https://github.com/libusb/libusb/issues/659)

However, there are also bugs in the ConnectionFX3 and the LMS64CProtocol classes.

In ConnectionFX3::Read() (and consequently in ConnectionFX3::Write()), the return value for libusb_bulk_transfer() is not read. In the case of a timeout, len is set to actual but actual is wrongly set to the requested length. The fix is to check if the return value of libusb_bulk_transfer() is 0 before setting the length. There is a possibility of partial transfers, but at this point it's probably better to just discard this one and to try again.

In LMS64CProtocol::TransferPacket(), considering the issue in libusb, the call to Read() (here ConnectionFX3::Read()) causes the packet to be filled with junk, hence the Gateware version errors. Also, the call to ParsePacket() is always made without regards to the status. Even with a longer timeout, the call to read the device returns junk nonetheless (a different issue, but on the FX3 side). This is fixed by adding a second call to Read() when an error occurs on the first one, and proper error handling, discarding the packet if an error occurred (not calling ParsePacket()).

9600 commented 4 years ago

@TehWan many thanks for this and the PR!

@IgnasJarusevicius, could you please review and merge etc.

SimonG4ELI commented 4 years ago

Agreed,

A fine catch.

Simon Brown, G4ELI

https://www.sdr-radio.com

From: Andrew Back notifications@github.com Sent: 04 December 2019 09:46 To: myriadrf/LimeSuite LimeSuite@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [myriadrf/LimeSuite] Repeating Gateware version mismatch problem (#287)

@TehWan https://github.com/TehWan many thanks for this and the PR!

@IgnasJarusevicius https://github.com/IgnasJarusevicius , could you please review and merge etc.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/myriadrf/LimeSuite/issues/287?email_source=notifications&email_token=AEZKU5R64F2ZVASJ6UAPDY3QW5355A5CNFSM4JVDCTM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF4LTQA#issuecomment-561560000 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZKU5T56PH25KVGEM7M3MTQW5355ANCNFSM4JVDCTMQ . https://github.com/notifications/beacon/AEZKU5UV5PH4WY55HEEG33LQW5355A5CNFSM4JVDCTM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF4LTQA.gif

IgnasJarusevicius commented 4 years ago

Hi, I do not have a machine that could reproduce this problem so I will wait for results for yuor futher investigation. Anyway, looking at the PR, I would like not to change LMS64CProtocol. If it is an issue with FX3, then the second read (retry) could be done in ConnectionFX3::Read() and I don't see how calling ParsePacket() when status is bad could cause problems as the final result (e.g. result of ReadRegisters) will still contain invalid data (as I see it, calling ParsePacket() can only make it partially valid in some cases). Also, it is strange that convertStatus() passes when garbage data is returned as it should give an error if status byte in returned buffer is not 1 (maybe that data is not complete garbage/random).

SignalWhisperer commented 4 years ago

Hi, Good points. I'll make the changes you mentioned.

ddcc commented 4 years ago

Interesting, I've also been experiencing this problem, but haven't had the time to look into it. It's good to know that this issue may be specific to the USB host controller, I'm curious which ones appear to be problematic? I encounter this on a 32-bit ARM board, the ODROID-XU4.

SignalWhisperer commented 4 years ago

I have an AMD USB 3.0 Host controller, as reported by lspci|grep USB.

SignalWhisperer commented 4 years ago

Out of curiosity, I tested it with a USB 2 port and don't get the issue at all. It seems to be isolated to USB 3, at least on my machine.

Edit: My laptop, which does not have the issue, only has USB 2.

SignalWhisperer commented 4 years ago

I just tried with the latest Linux kernel (5.4.2). The issue is still there, but the garbage data is gone, zero'd out. Definitely a bug in the kernel.

@IgnasJarusevicius, do you still want to patch this for the people out there affected by the kernel bug? If so, I'll make the changes you mentioned and bring the fix only in ConnectionFX3.

OhSoGood commented 4 years ago

@TehWan , @IgnasJarusevicius ; I'm facing the same problem (on 2 AMD PCs) and I'm not sure to understand the status of the issue. Is the patch of Dec 4 enough to fix it? Could you explain? Thank you!

SignalWhisperer commented 4 years ago

@OhSoGood The patch was merged in the master branch. If you try building the latest version, do you still get the same problem?

OhSoGood commented 4 years ago

Indeed it does. Thank you a lot! @IgnasJarusevicius : could LymeMicro provide a daily/master build somewhere, e.g. on github, myriadrf or your own website? Self-compililing is not always practical nor even doable depending on context.

pushandr1 commented 4 years ago

@TehWan , Hi,

For me this issue is still present with my old i5 2500K based PC and onboard ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller (for USB 3.0).

Is there any chance it would be fixed soon or I'm better off switching to another USB controller/ PC hardware?

P.S. Version information: Library version: v20.01.0-gc931854e Build timestamp: 2020-04-17 Interface version: v2020.1.0 Binary interface: 20.01-1