mcuee / libusbk

libusbk official github repo
75 stars 36 forks source link

xfer-async.exe hangs #2

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hello.
First of all I'd like to thank you for this excellent library. It is very well 
documented and set up easily. All the samples compiled without any problems and 
I was happy to see show-device.exe hooking up my test device (Transcend USB 
flash drive). It is a pity though that the library is not cross-platform.

The only problem I have experienced so far is the hanging of one of your sample 
programs.

What steps will reproduce the problem?
1. Insert USB flash drive ("test device")
2. Using Driver Install Creator Wizard specify the libusbK v 3.0.4.2 driver for 
the test device (by the way, is it OK that liusbK driver has version 3.0.4.2?)
3. Run show-device.exe to get pid and vid of the test device (vid=0457, 
pid=0151)
4. Run xfer-async.exe with these pid and vid

What is the expected output? What do you see instead?
I expect that the program writes something to console, maybe waits for some 
time and finishes normally. 
I see messages:
Looking for device vid/pid 0457/0151..
Using 0457:0151 (1B5454055201B6): USB Mass Storage Device - Silicon Integrated 
Systems Corp.
Device opened successfully!
Bench_Configure failed.
Transfer #0 submitted for 4096 bytes.
Transfer #1 submitted for 4096 bytes.
Transfer #2 submitted for 4096 bytes.

At this point the program hangs. It does not respond neither to Ctrl+C nor to 
Stop process requests via the task manager. I had to reboot my system to close 
the console window. I should notice, however, that while I was writing this 
text, another console window with the xfer-async.exe running finished. I have 
started it again and can confirm that it finishes but after too long period of 
time (several minutes).

What version of the product are you using? On what operating system?
libusbK installed from libusbK-3.0.5.1-setup.exe
OS: Windows 7 64 bit on HP 4510s laptop

Please provide any additional information below.
I have run the program in a debugger and found out that it hangs inside the 
OvlK_WaitAndRelease call. As I understand it waits for the feedback from the 
device, as specified in asynchronous mode of isochronous transfer. However, 
hanging does not seem correct, because there is timeout set to 1000 ms, so the 
function should have returned after this time elapsed.

Original issue reported on code.google.com by khani...@gmail.com on 7 Jan 2012 at 7:27

GoogleCodeExporter commented 9 years ago
> by the way, is it OK that liusbK driver has version 3.0.4.2?)

No, this is a packaging problem that I will fix in the next release.

> However, hanging does not seem correct, because there is timeout set to 1000 
ms, so the function should have returned after this time elapsed.

It is possible this is caused by a bug in the OvlK_Wait function.  If so, the 
code that will fix it is checked in at r172.  However, this example is not 
designed for a Mass Storage Device.

Regards,
Travis

Original comment by libusbdo...@gmail.com on 8 Jan 2012 at 1:37

GoogleCodeExporter commented 9 years ago
Thanks. The r172 commit indeed made OvlK_Wait() return after the timeout (1 
second) elapsed. However, the program still hangs. This time the debugger shows 
that it reaches the end and to the debugger it seems that the program finished. 
However, the corresponding console does not close. Without debugging it is even 
more obvious that the program hangs. By the way, the console windows still 
close after a large period of time (several minutes).

I understand that the example is not suited for mass storage devices, but this 
was the only device I have at the moment. Anyway, I think that a program (or 
driver) must never hang.

Probably the root of the problem lies within the driver, otherwise it is 
difficult to explain the fact that a user mode application could not be forced 
to stop.

Output:
Looking for device vid/pid 0457/0151..
Using 0457:0151 (1B5454055201B6): USB Mass Storage Device - Silicon Integrated 
Systems Corp.
Device opened successfully!
Bench_Configure failed.
Transfer #0 submitted for 4096 bytes.
Transfer #1 submitted for 4096 bytes.
Transfer #2 submitted for 4096 bytes.
Transfer #0 did not complete.
 errorCode=000004C7hTransfer #1 did not complete.
 errorCode=000004C7hTransfer #2 did not complete.
 errorCode=000004C7hTransferred 0 bytes in 3 transfers. errorCode=000004C7h

Original comment by khani...@gmail.com on 8 Jan 2012 at 5:07

GoogleCodeExporter commented 9 years ago
> Probably the root of the problem lies within the driver, otherwise it is 
difficult to explain the fact that a user mode application could not be forced 
to stop.

The waiting is done in the user dll.  It uses a standard windows wait function. 
 The driver cannot cause this.

I have also tested this with a USB stick and could not reproduce your problem 
(win7 x64).  It takes only a few seconds for the application to complete:
$ ./xfer-async/bin/x86/xfer-async vid=0930 pid=6545
Looking for device vid/pid 0930/6545..
Using 0930:6545 (00187D1174B6EB71B000CEA4): DataTraveler 108 - Toshiba Corp.
Device opened successfully!
Bench_Configure failed.
Transfer #0 submitted for 4096 bytes.
Transfer #1 submitted for 4096 bytes.
Transfer #2 submitted for 4096 bytes.
Transfer #0 did not complete.
 errorCode=000004C7hTransfer #1 did not complete.
 errorCode=000004C7hTransfer #2 did not complete.
 errorCode=000004C7hTransferred 0 bytes in 3 transfers. errorCode=000004C7h

It is possible you have a host controller that is malfunctioning. Cold boot 
your PC.

Regards,
Travis

Original comment by libusbdo...@gmail.com on 8 Jan 2012 at 6:16

GoogleCodeExporter commented 9 years ago
The speed you respond to comments is pretty impressive! Thanks again.

I'd like to notice a few things.

First of all I will try to reproduce the issue on a differenc PC as soon as I 
get access to one.

Secondly, I didn't mean that the OvlK_Wait() function was the reason for the 
problem after r172. Your check in fixed the timeout bug in OvlK_Wait() and 
before I posted the comment I checked with the debugger that the program did 
not hang inside it anymore. I said about the driver specifically because it 
seemed that the program itself didn't hang at all according to the debugger. So 
my guess was that the driver entered bad state and did not release some system 
resource. The notes below will add up to this.

I reproduced the following three strange situations. They were all observed 
after system reboot and double-checked (i.e. each of them had been reproduced 
twice):
1. After running xfer-async.exe just once I cannot put my laptop to suspend 
mode. The screen turns off but the system continues running. Only hard shutdown 
helps.
2. After running xfer-async.exe just once I cannot turn off the laptop. Again, 
my session ends, I see the shutdown message, but it does not disappear and 
again I have to perform a hard shutdown.
3. After running xfer-async.exe just once show-device.exe stops distinguishing 
between the device being plugged or unplugged. Normally, when I run 
show-device.exe and no device is connected, it writes "No devices connected". 
However after running xfer-async.exe (with the device connected) 
show-device.exe output says same things as if the device was connected:
> Example device not found.
> Example device not found.
> 0457:0151 (1B5454055201B6): USB Mass Storage Device - Silicon Integrated 
Systems Corp.

In my opinion all these situations point to the suggestion I made that the 
libusbK.sys driver's state becomes incorrect.

Of course I keep in mind your guess about my USB host malfunctioning and that 
is why I will try to check another PC as soon as possible. But I must say that 
I have never experienced any problems with my USB host before (and now the 
default driver works perfectly fine).

I will try to figure out which piece of code in xfer-async.exe makes trouble by 
excluding more and more function calls (since show-device.exe does not cause 
any problems I conclude that this process will finish before I get the bare 
"int main() {return 0;}" :) ). I will inform you about the results.

By the way, I noticed from the console log you pasted that you had started the 
program in a UNIX-like environment (bash or something under cygwin I presume). 
Forgive me my question, it is pure curiosity, but why did you use that? :)

Original comment by khani...@gmail.com on 8 Jan 2012 at 8:30

GoogleCodeExporter commented 9 years ago
I forgot to mention one more fact (which was easy to get).
Now (I mean with the r172 fix) after the first run of xfer-async.exe (which 
produced all the messages about transfers I posted before) all subsequent calls 
start to hang after only printing message

> Looking for device vid/pid 0457/0151..
> Using 0457:0151 (1B5454055201B6): USB Mass Storage Device - Silicon 
Integrated Systems Corp.

This time however Ctrl+C successfully interrupts the program. I checked: it 
hangs inside GetOverlappedResult() call in lusbk_ioctl.c:77 which is called 
from xfer-async.c:68 (Usb.Init()). 

Hope this helps.

Original comment by khani...@gmail.com on 8 Jan 2012 at 8:41

GoogleCodeExporter commented 9 years ago
>By the way, I noticed from the console log you pasted that you had started the 
program in a UNIX-like environment (bash or something under cygwin I presume). 
Forgive me my question, it is pure curiosity, but why did you use that? :)

It is an msys shell for mingw64.

> In my opinion all these situations point to the suggestion I made that the 
libusbK.sys driver's state becomes incorrect.

I do not think so, but you could install your device with WinUSB.sys or 
libusb0.sys to test your theory.  You might also look at the libusb-1.0 windows 
backend.

I think it is time you install the libusbK debug binaries and use DebugView to 
monitor the driver log messages.

Are you sure you are using libusbK.sys?  It is behaving a bit like the 
libusb-win32 filter driver over a MSD device which is also being controlled by 
the windows class driver.

What is listed in the Control Panel->Device Manager->Your USB Device->Driver 
Details.

> This time however Ctrl+C successfully interrupts the program. I checked: it 
hangs inside GetOverlappedResult() call in lusbk_ioctl.c:77 which is called 
from xfer-async.c:68 (Usb.Init()). 

This is different than what you indicated previously.  You will need to attach 
a driver log (DebugView) for me to diagnose further and provide steps I can use 
to reproduce.

Regards,
Travis

Original comment by libusbdo...@gmail.com on 9 Jan 2012 at 1:06

GoogleCodeExporter commented 9 years ago
> I do not think so, but you could install your device with WinUSB.sys or 
libusb0.sys to test your theory.  You might also look at the libusb-1.0 windows 
backend.
> I think it is time you install the libusbK debug binaries and use DebugView 
to monitor the driver log messages.
> Are you sure you are using libusbK.sys?  It is behaving a bit like the 
libusb-win32 filter driver over a MSD device which is also being controlled by 
the windows class driver.
> What is listed in the Control Panel->Device Manager->Your USB Device->Driver 
Details.

I tried out both WinUSB.sys and libusb0.sys drivers. They all work fine, 
nothing hangs. I also installed DebugView and ran the following programs with 
all three drivers (starting from WinUSB.sys and ending by libusbK.sys):

1. show-device.exe (with device plugged)
2. show-device.exe (with device unplugged)
3. xfer-async.exe (with device plugged)
4. (for libusbK only since only it was wrong) show-device.exe (with device 
unplugged)

In all cases I saved both program output log and DebugView log, see the archive 
attached (file names are self-describing).
I also saved all three Device Manager configurations for the USB device (see 
device-manager.txt), so yes, I am sure I was using libusbK.sys.

You may see the following differences of libusbK.sys from libusb0.sys and 
WinUSB.sys (see the archive):

1. Strange messanges appear in DebugView after running xfer-async.exe with a 
period of one minute:
    ...
    00000014    3:01:18 Thread 0xFFFFFA8003CE7040 is waiting for all inflight requests to be acknowledged on WDFQUEUE 0x0000057FF995C3A8    
    00000015    3:02:18 Thread 0xFFFFFA8003CE7040 is waiting for all inflight requests to be acknowledged on WDFQUEUE 0x0000057FF995C3A8    
    ...
    While I am writing the post they still appear. Seems like indeed some thread did not release a resource. Strange though that the thread itself (according
    to the Task Manager finished after several minutes and the corresponding console window closed).

2. libusbK tries to perform OvlK_WaitAndRelease() call in xfer-async.c:126 
three times instead of one (in the latter case the 'if  !ovlArray[ovlIndex])' 
condition becomes true and the loop body gets skipped before the next iteration 
which may indicate about some incorrectness in ovlArray[] content)

> This is different than what you indicated previously.  You will need to 
attach a driver log (DebugView) for me to diagnose further and provide steps I 
can use to reproduce.
Well, of course it is different! Previously there was a bug in the 
OvlK_WaitAndRelease() function which you fixed in r172. After the fix I updated 
the source code and recompiled the examples. So the behaviour changed.

I should also notice that a friend of mine has reproduced the bug on his PC 
(also Windows 7 64 bit). Hope he will comment on the issue soon.

Original comment by khani...@gmail.com on 9 Jan 2012 at 11:32

Attachments:

GoogleCodeExporter commented 9 years ago
Unfortunately, there are no messages from the driver. In DebugView, under the 
captions menu, ensure all items are checked.

Start debugbview before you connect the device and leave it running until after 
you disconnect.  You should see many more messages.

Regards,
Travis

Original comment by libusbdo...@gmail.com on 10 Jan 2012 at 1:15

GoogleCodeExporter commented 9 years ago
I tried out both WinUSB.sys and libusb0.sys drivers. They all work fine, 
nothing hangs.

Then perhaps there is a problem in the driver; they should all behave the same 
for this test.

> I should also notice that a friend of mine has reproduced the bug on his PC 
(also Windows 7 64 bit). Hope he will comment on the issue soon.

What was he using for a USB device?  We have done lots of testing with libusbK 
and never had any problems like this.

Regards,
Travis

Original comment by libusbdo...@gmail.com on 10 Jan 2012 at 1:45

GoogleCodeExporter commented 9 years ago
> Well, of course it is different! Previously there was a bug in the 
OvlK_WaitAndRelease() function which you fixed in r172. After the fix I updated 
the source code and recompiled the examples. So the behaviour changed.

I see.  It is odd then that you did not have this problem first since the Init 
call comes first.  Perhaps the driver was already in a botched state from a 
previous example run?

Regards,
Travis

Original comment by libusbdo...@gmail.com on 10 Jan 2012 at 1:57

GoogleCodeExporter commented 9 years ago
Attached is a debugview log file example.  This is what it should look like 
when everything goes as expected.

1) Connect
2) Run xfer-async
3) Disconnect

$ ./xfer-async/bin/x86/xfer-async vid=0781 pid=5406
Looking for device vid/pid 0781/5406..
Using 0781:5406 (0876900EFE02035E): U3 Cruzer Micro - SanDisk Corp.
Device opened successfully!
Transfer #0 submitted for 4096 bytes.
Transfer #1 submitted for 4096 bytes.
Transfer #2 submitted for 4096 bytes.
Transfer #0 did not complete.
 errorCode=000004C7hTransfer #1 did not complete.
 errorCode=000004C7hTransfer #2 did not complete.
 errorCode=000004C7hTransferred 0 bytes in 3 transfers. errorCode=000004C7h

Original comment by libusbdo...@gmail.com on 10 Jan 2012 at 3:23

Attachments:

GoogleCodeExporter commented 9 years ago
> What was he using for a USB device?  We have done lots of testing with 
libusbK and never had any problems like this.
When I first asked him he perormed the test with a memory stick and nothing 
evil happened. But the next day he tried the PCM2706 chip and a different 
memory stick. In both cases he observed exactly the same symptoms as I did and 
had to reboot the system in a crude manner.

> I see.  It is odd then that you did not have this problem first since the 
Init call comes first.  Perhaps the driver was already in a botched state from 
a previous example run?
Exactly, I meant that this was the second run ("...after the first run of 
xfer-async.exe all subsequent calls..." - see comment #5). Maybe the phrase was 
not enough clear though.

> Unfortunately, there are no messages from the driver. In DebugView, under the 
captions menu, ensure all items are checked.
Indeed, I ran the test incorrectly. It was not DebugView configuration though 
but the fact that I didn't replace (reinstall) the Kit with the debug version 
(libusbK-3.0.5.1-setup-chk.exe).

So I reinstalled the Kit and performed the test. See results in the attachment 
(both the DebugView log and a short chronology reference).

It seems that I also noticed another problem, this time with the Driver Install 
Creator Wizard. DebugView shows that it has memory leaks. Again, see the 
attachment.

By the way, is it ok that all drivers except libusbK.sys have a different 
xfer-async.exe output? (I mean the number of times the "Transfer #XXXX did not 
complete" message appears - 3 vs 1) Do you observe the same fact? Also, it 
seems that there is a newline symbol put in a wrong place, 
"errorCode=000004C7h" message sticks to the next "Transfer #XXXX did not 
complete" message).

Original comment by khani...@gmail.com on 10 Jan 2012 at 7:36

Attachments:

GoogleCodeExporter commented 9 years ago
I think I have reproduced it! If I set EP_ADDRESS to an invalid endpoint my 
results appear to be the same as yours.  In your case, the default value for 
EP_ADDRESS is already invalid (0x81).  Try setting it to 0x82 and see if your 
problem goes away.

FYI:  Again, Make sure and reboot between tests. ;) because you are right, the 
driver is definitely hanging.  I am also receiving the WDFQUEUE messages and I 
have now disconnected the device and the driver still will not unload.

I've again attached my log file and as you can see it is nearly identical to 
yours.

Regards,
Travis

Original comment by libusbdo...@gmail.com on 10 Jan 2012 at 10:44

Attachments:

GoogleCodeExporter commented 9 years ago
I have now found a pretty serious bug in my driver.  I must thank you for your 
patience and persistence on this. I will have a patch (and probably a whole new 
RC release) available some time today.

I will also re-factor the xfer-async example; as you have pointed out there are 
some problems with the example itself.

FYI: Changing EP_ADDRESS so that it is valid *should* allow xfer-async to 
function properly.

Regards,
Travis 

Original comment by libusbdo...@gmail.com on 10 Jan 2012 at 11:01

GoogleCodeExporter commented 9 years ago
Oh, gush, glad to hear that my annoyance ultimately helped :) To tell you the 
truth most of the bugs I have filed so far haven't been reviewed at all and in 
one or two cases where I got the reply they were discarded. So let me thank you 
for paying attention :)

Setting EP_ADDRESS to 0x82 indeed helped. Nothing hangs anymore, lucky me, 
don't need to reboot  :)     Log is attached

Original comment by khani...@gmail.com on 10 Jan 2012 at 12:19

Attachments:

GoogleCodeExporter commented 9 years ago
There is a bug in the driver which causes it to skip the error handling 
routines for invalid requests.  The results are devastating just as you have 
been pleading. ;) 

The fix was simple and I'm just finishing up some enhancements to the 
xfer-async example.

> By the way, is it ok that all drivers except libusbK.sys have a different 
xfer-async.exe output? (I mean the number of times the "Transfer #XXXX did not 
complete" message appears - 3 vs 1) Do you observe the same fact?

I'm looking into this, and yes, I did observe the same results.

Regards,
Travis

Original comment by libusbdo...@gmail.com on 10 Jan 2012 at 1:05

GoogleCodeExporter commented 9 years ago
> It seems that I also noticed another problem, this time with the Driver 
Install Creator Wizard. DebugView shows that it has memory leaks. Again, see 
the attachment.

The leak is in libwdi.  I reported this to Pete several weeks ago; I believe it 
has been fixed in v1.2.1.
http://sourceforge.net/projects/libwdi/files/releases/

Regards,
Travis

Original comment by libusbdo...@gmail.com on 10 Jan 2012 at 2:05

GoogleCodeExporter commented 9 years ago
Ok, thanks, you answered all my questions. Good luck!

Original comment by khani...@gmail.com on 11 Jan 2012 at 6:56

GoogleCodeExporter commented 9 years ago
Fixed in 3.0.5.2

Original comment by libusbdo...@gmail.com on 29 Jan 2012 at 1:14