nxp-imx / mfgtools

Freescale/NXP I.MX Chip image deploy tools.
BSD 3-Clause "New" or "Revised" License
540 stars 296 forks source link

issue flashing with raspberry host #76

Open JLMorange opened 5 years ago

JLMorange commented 5 years ago

I builded a uuu on a raspberry pi3 I use the same script on a windows host with no issue on the raspberry, the script perform about 28 steps without any issue, but fails systematically on

1:15>Start Cmd:FBK: acmd dd of=/dev/mmcblk2p2 bs=1M conv=fsync 1:15>Okay 1:15>Start Cmd:FBK: ucp ../sercomm-image.rootfs.ext4 t:- 1%1:15>Fail Bulk(W):LIBUSB_ERROR_TIMEOUT

I see the progress indicator goes till 10%, then restart a 1% till 7% about, then restart.... and so on util the previous error I suspect the timeout to be too short... but didn't found in the documentation how I can change the timeout of an FBK command

thanks for your help!

nxpfrankli commented 5 years ago

FBK needn't timeout. if timeout happen, it should be something wrong.

JLMorange commented 5 years ago

I don't see anything wrong neither on serial line, nor in host dmesg.... how can I troubleshoot the issue?

JLMorange commented 5 years ago

I found something related to this It looks like the Raspberry Pi USB hardware is not reliable in USB 2.0 mode and lose USB packets. See http://www.yoctopuce.com/EN/article/the-quest-for-the-ideal-mini-pc and https://sourceforge.net/p/libusb/mailman/libusb-devel/thread/52C308C0.3030705%40probo.com/

in my tests, managed sometimes it worked, sometimes not... I read somewhere that it is related to async sending.... this issue is really annoying...

JLMorange commented 5 years ago

I will try to force usb to 1.1 as stated in https://www.raspberrypi.org/forums/viewtopic.php?t=5249&start=200 but maybe a better error retry in uuu should be implemented as it slow down the ethernet....

nxpfrankli commented 5 years ago

usb bulk transfer is guaranteed by usb protocol. Hardware will do retry if driver work correct.

JLMorange commented 5 years ago

hummmm seems really related to some timeout..... according to random() run, sometimes I pass the command sometimes not.... and i have different behaviour when running with -V or -v.... what is reproductible is the "random" timeout....

looking into the code I see :

int BulkTrans::write(void *buff, size_t size)
{
    int ret;
    int actual_lenght;
    for (size_t i = 0; i < size; i += m_MaxTransPreRequest)
    {
        uint8_t *p = (uint8_t *)buff;
        p += i;
        size_t sz;
        sz = size - i;
        if (sz > m_MaxTransPreRequest)
            sz = m_MaxTransPreRequest;

        ret = libusb_bulk_transfer(
            (libusb_device_handle *)m_devhandle,
            m_ep_out.addr,
            p,
            sz,
            &actual_lenght,
            m_timeout
        );

        if (ret < 0)
        {
            string error;
            string err;
            err = "Bulk(W):";
            err += libusb_error_name(ret);
            set_last_err_string(err);
            return ret;
        }
    }

what is the value of m_timeout when FBK is used?

nxpfrankli commented 5 years ago

I remember it is 2s.

best regards Frank Li

nxpfrankli commented 5 years ago

FB[-t 10000]: ucmd

It is correct. If you run as command line

uuu "FB[-t 1000]:" ucmd

if it is script. " is not needed.

-t time only support FB.

FBK doesn't support it because it is not necessary.

UFB always return busy status far below default timeout value.

JLMorange commented 5 years ago

I made some changes to change the timeout of FBK:ucp commands, this improved a little at least I managed to complete flash once.... does anyone know how to use the actual_lenght ? according to to libusb_bulk_transfer the value must be checked to know if it is a real timeout or a partial transfer... according to this post : https://libusb-devel.narkive.com/MIW5xNqX/libusb-bulk-transfer-return-timeout-error-and-transferred-set-to-0 it seems that timeout can occurs if too many data are piped into libusb....

nxpfrankli commented 5 years ago

actual_lenght actually is not used. I remember the caller always split into 64k for each transfer to avoid single transfer hit timeout.

JLMorange commented 5 years ago

FBK doesn't support it because it is not necessary. welll..... not quite... since it affect the libusb timeout parameter.... changing the timeout on FBK commands did improve a little

I also tried to perform retries on timeout errors and got some strange result.... most of transfer perform without any time out, then I go a timeout after transferring 16384 bytes on 65536, followed by a timeout after transferring 32768 on 49152, then a timeout transferring 0 on 16384.... so strangely it occurs at end of file....

looks like I have the same issue than the previous post : the usb strangle at some time and uuu continue to flood it....

nxpfrankli commented 5 years ago

If usb low level transfer is okay, timeout only happen when device side have not queue transfer on time.

JLMorange commented 5 years ago

I changed some usb low level parameters, setting m_MaxTransPreRequest to 0x10000 and m_timeout to 9000 (after some tries I noticed that the m_timeout of cmd is not repecuted to usb as you stated...) and changing only this parameter just improved slightly the status.... changing both allowed 3 complete flash in a row.... and I don't see any retries on timeout... I'll do further testing and if concluent, will provide a merge request

JLMorange commented 5 years ago

so far 25 flash, no error

JLMorange commented 5 years ago

I face a same issue on HID commands, even if less often... I was thinking about making a fix that allow the user to tune some variables something like conf::USB::HID::max_packet_size conf::USB::HID::timeout conf::USB::BULK::max_packet_size conf::USB::BULK::timeout

and create a singleton pattern "conf" according to actual design, it is likely lot of work to pass arguments for tunning the low level transport layer... sometime the packets are split in the upper layer part (For HID for example), sometimes it is at the low layer part.... what do you think about this?

nxpfrankli commented 5 years ago

It is not necessary to turn max_packet_size here, we just set a safety value such as 64k before we really met performance problem when use small max_packet_size.

HID, HID report layer, ROM have limitation to accept max transfer (1025) per one report. So HID write's max write size is 1025. Needn't split in HIDTrans.

Keep everything simple as possible.

JLMorange commented 5 years ago

It is not necessary to turn max_packet_size here

with 64k packet I flashed successfully 2 time over more than 100 attempt... that's 99.99% FAIL....

we really met performance problem when use small max_packet_size

when that's don't work at all, I don't care about performance! anyway on my tries the loss of performance between the x86 64k packets and my 4k packets is less than notable... anyway, that's why I propose to make it a script tunnable, so that you can keep a default to 64k and I can tune so that it works

So HID write's max write size is 1025

true, but my fail is on read.... which has too big packets size...

Keep everything simple as possible.

that's why I propose to use a singleton pattern, so that the low level "tunables" don't have to be passed across all the protocol layers....

nxpfrankli commented 5 years ago

Yes, but the root cause should be PI usb host driver. I think any work around here should not resolve problem 100%.

JLMorange commented 5 years ago

so far.... I tested with an auto-loop flash script... I set the "deadline" to 200 success in a row... the result is : 200 success.... so I agree the root cause is likely the PI usb host driver, however, the time they find and integrate the solution, my project will be ended... the size of usb packet is a known issue (since you had to choose a fail safe value) and is driver dependant, so letting the user tune is in the script is fast solution....

JLMorange commented 5 years ago

I finished my fix, can you grant me access to git so that I can push my branch in order to ask for a merge request? thanks

nxpfrankli commented 5 years ago

You should follow github work flow to send pull request to me.

push into yourself folk repo, then send pull request.

JLMorange commented 5 years ago

I finished my fix, can you grant me access to git so that I can push my branch in order to ask for a merge request? thanks

You should follow github work flow to send pull request to me.

push into yourself folk repo, then send pull request.

strange way to do things... this is waste of space, having a clone of each repo.... anyway here is a patch... made on 1.2.39 tag 1.2.39.patch.txt

nxpfrankli commented 5 years ago

It is easy to review when you send pull request by github.

Needn't to clone.

git remote add my_uuu git push my_uuu master

then you can send pull request in github

JLMorange commented 5 years ago

I created my repo, pushed the sources.... they are there : https://github.com/JLMorange/mfgtootls_bugfixes and well... I can send pull request to my repository... that's nice... and so predictible... since they are 2 separate repositories, github can't know they are related... so I can't send pull request to you

nxpfrankli commented 5 years ago

Please chick "fork" button at https://github.com/NXPmicro/mfgtools

then push your change into forked mfgtools git repo. you will send pull request to me

Bluebugs commented 4 years ago

I can confirm that this bug is still happening. I used @JLMorange patch (after manual correction) to try to get things working, but couldn't figure the right set of value that would actually work. I have the impression that @JLMorange patch leads to corruption of file being copied. It might also be because the code changed since his patch and I did just a blind update (I didn't try hard to understand what was going on). Any recommendation on how to get this fixed once and for all?

mwasilew commented 3 years ago

I reproduced the same issue with the latest release. After going back to 1.2.39 and applying patch, flashing worked. I'm trying to forward port it to the latest release.

Yannholo commented 2 years ago

I also have the same issue on multiple RPI (400, 3b) with ARM64. The 1.2.39 and patch seams to work but are really slow... Do we have another lead on how to fix this ?

edit: It even breaks the whole USB driver on the RPI 400, the mouse and keyboard don't work until a restart...