Open JLMorange opened 5 years ago
FBK needn't timeout. if timeout happen, it should be something wrong.
I don't see anything wrong neither on serial line, nor in host dmesg.... how can I troubleshoot the issue?
I found something related to this It looks like the Raspberry Pi USB hardware is not reliable in USB 2.0 mode and lose USB packets. See http://www.yoctopuce.com/EN/article/the-quest-for-the-ideal-mini-pc and https://sourceforge.net/p/libusb/mailman/libusb-devel/thread/52C308C0.3030705%40probo.com/
in my tests, managed sometimes it worked, sometimes not... I read somewhere that it is related to async sending.... this issue is really annoying...
I will try to force usb to 1.1 as stated in https://www.raspberrypi.org/forums/viewtopic.php?t=5249&start=200 but maybe a better error retry in uuu should be implemented as it slow down the ethernet....
usb bulk transfer is guaranteed by usb protocol. Hardware will do retry if driver work correct.
hummmm seems really related to some timeout..... according to random() run, sometimes I pass the command sometimes not.... and i have different behaviour when running with -V or -v.... what is reproductible is the "random" timeout....
looking into the code I see :
int BulkTrans::write(void *buff, size_t size)
{
int ret;
int actual_lenght;
for (size_t i = 0; i < size; i += m_MaxTransPreRequest)
{
uint8_t *p = (uint8_t *)buff;
p += i;
size_t sz;
sz = size - i;
if (sz > m_MaxTransPreRequest)
sz = m_MaxTransPreRequest;
ret = libusb_bulk_transfer(
(libusb_device_handle *)m_devhandle,
m_ep_out.addr,
p,
sz,
&actual_lenght,
m_timeout
);
if (ret < 0)
{
string error;
string err;
err = "Bulk(W):";
err += libusb_error_name(ret);
set_last_err_string(err);
return ret;
}
}
what is the value of m_timeout
when FBK is used?
I remember it is 2s.
best regards Frank Li
FB[-t 10000]: ucmd
It is correct. If you run as command line
uuu "FB[-t 1000]:" ucmd
if it is script. " is not needed.
-t time only support FB.
FBK doesn't support it because it is not necessary.
UFB always return busy status far below default timeout value.
I made some changes to change the timeout of FBK:ucp commands, this improved a little at least I managed to complete flash once.... does anyone know how to use the actual_lenght ? according to to libusb_bulk_transfer the value must be checked to know if it is a real timeout or a partial transfer... according to this post : https://libusb-devel.narkive.com/MIW5xNqX/libusb-bulk-transfer-return-timeout-error-and-transferred-set-to-0 it seems that timeout can occurs if too many data are piped into libusb....
actual_lenght actually is not used. I remember the caller always split into 64k for each transfer to avoid single transfer hit timeout.
FBK doesn't support it because it is not necessary. welll..... not quite... since it affect the libusb timeout parameter.... changing the timeout on FBK commands did improve a little
I also tried to perform retries on timeout errors and got some strange result.... most of transfer perform without any time out, then I go a timeout after transferring 16384 bytes on 65536, followed by a timeout after transferring 32768 on 49152, then a timeout transferring 0 on 16384.... so strangely it occurs at end of file....
looks like I have the same issue than the previous post : the usb strangle at some time and uuu continue to flood it....
If usb low level transfer is okay, timeout only happen when device side have not queue transfer on time.
I changed some usb low level parameters, setting m_MaxTransPreRequest to 0x10000 and m_timeout to 9000 (after some tries I noticed that the m_timeout of cmd is not repecuted to usb as you stated...) and changing only this parameter just improved slightly the status.... changing both allowed 3 complete flash in a row.... and I don't see any retries on timeout... I'll do further testing and if concluent, will provide a merge request
so far 25 flash, no error
I face a same issue on HID commands, even if less often... I was thinking about making a fix that allow the user to tune some variables something like conf::USB::HID::max_packet_size conf::USB::HID::timeout conf::USB::BULK::max_packet_size conf::USB::BULK::timeout
and create a singleton pattern "conf" according to actual design, it is likely lot of work to pass arguments for tunning the low level transport layer... sometime the packets are split in the upper layer part (For HID for example), sometimes it is at the low layer part.... what do you think about this?
It is not necessary to turn max_packet_size here, we just set a safety value such as 64k before we really met performance problem when use small max_packet_size.
HID, HID report layer, ROM have limitation to accept max transfer (1025) per one report. So HID write's max write size is 1025. Needn't split in HIDTrans.
Keep everything simple as possible.
It is not necessary to turn max_packet_size here
with 64k packet I flashed successfully 2 time over more than 100 attempt... that's 99.99% FAIL....
we really met performance problem when use small max_packet_size
when that's don't work at all, I don't care about performance! anyway on my tries the loss of performance between the x86 64k packets and my 4k packets is less than notable... anyway, that's why I propose to make it a script tunnable, so that you can keep a default to 64k and I can tune so that it works
So HID write's max write size is 1025
true, but my fail is on read.... which has too big packets size...
Keep everything simple as possible.
that's why I propose to use a singleton pattern, so that the low level "tunables" don't have to be passed across all the protocol layers....
Yes, but the root cause should be PI usb host driver. I think any work around here should not resolve problem 100%.
so far.... I tested with an auto-loop flash script... I set the "deadline" to 200 success in a row... the result is : 200 success.... so I agree the root cause is likely the PI usb host driver, however, the time they find and integrate the solution, my project will be ended... the size of usb packet is a known issue (since you had to choose a fail safe value) and is driver dependant, so letting the user tune is in the script is fast solution....
I finished my fix, can you grant me access to git so that I can push my branch in order to ask for a merge request? thanks
You should follow github work flow to send pull request to me.
push into yourself folk repo, then send pull request.
I finished my fix, can you grant me access to git so that I can push my branch in order to ask for a merge request? thanks
You should follow github work flow to send pull request to me.
push into yourself folk repo, then send pull request.
strange way to do things... this is waste of space, having a clone of each repo.... anyway here is a patch... made on 1.2.39 tag 1.2.39.patch.txt
It is easy to review when you send pull request by github.
Needn't to clone.
git remote add my_uuu git push my_uuu master
then you can send pull request in github
I created my repo, pushed the sources.... they are there : https://github.com/JLMorange/mfgtootls_bugfixes and well... I can send pull request to my repository... that's nice... and so predictible... since they are 2 separate repositories, github can't know they are related... so I can't send pull request to you
Please chick "fork" button at https://github.com/NXPmicro/mfgtools
then push your change into forked mfgtools git repo. you will send pull request to me
I can confirm that this bug is still happening. I used @JLMorange patch (after manual correction) to try to get things working, but couldn't figure the right set of value that would actually work. I have the impression that @JLMorange patch leads to corruption of file being copied. It might also be because the code changed since his patch and I did just a blind update (I didn't try hard to understand what was going on). Any recommendation on how to get this fixed once and for all?
I reproduced the same issue with the latest release. After going back to 1.2.39 and applying patch, flashing worked. I'm trying to forward port it to the latest release.
I also have the same issue on multiple RPI (400, 3b) with ARM64. The 1.2.39 and patch seams to work but are really slow... Do we have another lead on how to fix this ?
edit: It even breaks the whole USB driver on the RPI 400, the mouse and keyboard don't work until a restart...
I builded a uuu on a raspberry pi3 I use the same script on a windows host with no issue on the raspberry, the script perform about 28 steps without any issue, but fails systematically on
1:15>Start Cmd:FBK: acmd dd of=/dev/mmcblk2p2 bs=1M conv=fsync 1:15>Okay 1:15>Start Cmd:FBK: ucp ../sercomm-image.rootfs.ext4 t:- 1%1:15>Fail Bulk(W):LIBUSB_ERROR_TIMEOUT
I see the progress indicator goes till 10%, then restart a 1% till 7% about, then restart.... and so on util the previous error I suspect the timeout to be too short... but didn't found in the documentation how I can change the timeout of an FBK command
thanks for your help!