Open fricpa opened 3 months ago
to expand a bit on the discussion for writing to devices/files in general: The API documentation/specification of hid_write should be changed to say that all the function return value indicates is that the operating system or some level of buffering accepted the amount of bytes that were written there. The implementations on any OS (e.g. hidraw), i.e. the different backends, do not guarantee that when this function returns successfully that the data has actually made it to the device in any meaningful way.
Ensugin that is always the job of an overlaid protocol where the device needs to answer back something in response...
Other hid-api's have tried to make this caveat of writing/sending data more clear in their documentation, for instance: https://github.com/nyholku/purejavahidapi/blob/f769fcddf62503cff554e646587c92350ca664e5/src/purejavahidapi/HidDevice.java#L98
* The method returning is no guarantee that the data has been physically
* transmitted from the host to the device.
* <p>
* The method returns the actual number of bytes successfully scheduled to
* be sent to the device.
We should revise https://github.com/libusb/hidapi/blob/d101e5c7e4646ecde66b650b3b8ce4904acfb13a/windows/hid.c#L1103 and https://github.com/signal11/hidapi/issues/88 and https://github.com/signal11/hidapi/blob/windows_write_timeout/windows/hid.c#L602
The decision to add a 1 sec timeout seems completely arbitrary.
If necessary, maybe hid_write could provide a hid_write_timeout and if need to make sure not to have to wait forever you need to give an explicit timeout as with hid_read_timeout - that would also increase the symmetry of the api and restore the synchronous/blocking nature/intention.
As mentioned in the OP, libusb has the same issue with an arbitrary 1 sec timeout: https://github.com/libusb/hidapi/blob/d101e5c7e4646ecde66b650b3b8ce4904acfb13a/libusb/hid.c#L1431-L1435
hidraw has no timeout https://github.com/libusb/hidapi/blob/d101e5c7e4646ecde66b650b3b8ce4904acfb13a/linux/hid.c#L1100
How long does it take to perform the write on platforms where it doesn't fail? (Or with 0.8.0 implementation?)
If it takes longer than 1sec, it means to properly fix this we need to increase the timeout. (See also: #493).
We should revise ... and https://github.com/signal11/hidapi/issues/88 and https://github.com/signal11/hidapi/blob/windows_write_timeout/windows/hid.c#L602
If you look at https://github.com/signal11/hidapi/compare/master...windows_write_timeout and current master of libusb/hidapi - you should find that current master pretty much implements what windows_write_timeout
was suggesting back in the day.
I believe the issue is only with the timeout:
hidraw has no timeout
It has no configurable timeout. I've checked the kernel implementation - it uses 5sec as a hard-coded timeout.
Thanks for your insights @Youw . Do you have the link to the line in the kernel? Interesting that there would be any timeout at all, my impression was always that pipes (and sometimes even network io devices and files in general) always have infinite read/write timeout by default...
What do we prefer, increasing the timeout to 5s? (maybe making someone's project slower that relies on the timeout?), or adding hid_write_timeout?
I would vote for adding hid_write_timeout to the api and implementing it for all platforms...
Do you have the link to the line in the kernel?
https://github.com/libusb/hidapi/issues/493#issuecomment-1369927578
I would vote for adding hid_write_timeout to the api and implementing it for all platforms...
Again, not for all platforms possible (not for hidraw AFAIK).
What do we prefer
Make it configurable.
I suggest you to try to increase a timeout in your own fork, at least to test/confirm my suggestions.
I can confirm that a timeout of 5000 ms = 5 sec works well in my context. First I tried to revert to the signal11 version with infinite timeout on Windows, but that literally made the program hang forever in the case of a bad device.
So my final vote is to change the timeout to 5 sec everywhere as a pragmatic solution.
The libusb backend has the same issue, can we add a solution for that too? Couldn't we add a set write timeout which is documented as a "best effort" call and no operation for other platforms?
The libusb backend has the same issue, can we add a solution for that too?
That is my plan, yes.
documented as a "best effort" call and no operation for other platforms?
For libusb it will change timeouts not only for the write
operation, again to match the behaviod or linux kernel (hidraw).
So it not only not possible to implement for all platforms, but also has different behavior for those platforms which (kind of) support it.
That is not the "best efforst", but one of the worst, from design perspective.
To be resolved when libusb part is done too
Hi. I have some USB HID devices here (with some old version of a proprietary firmware...) that work fine with signal11's hidapi (ca. 0.7.0/0.8.0), but create troubles with 0.14.0.
Interestingly, I only see the issues on Windows and on Linux with the -libusb backend - hidraw seems to work fine. For the libusb backend I don't see the precise error (because https://github.com/libusb/hidapi/issues/684).
The Windows backend says: hid_write/WaitForSingleObject: (0x000003E5) Overlapped I/O operation is in progress.
I think the culprit is the new rather strange (to me...) implementation of hid_write in Windows. Here's the diff of hid_write in my context in my context, we build this into a JNI library for Java or call directly into into the hidapi .so with JNA and there is also usb4java (a libusb wrapper) involved, however I am quite convinced that these complications are not the culprit.
To give a bit more background, I see it failing like this:
our device is receiving a 512 byte block of data split into multiple 64 byte hid reports and apparently this firmware of the device performs some slow logic on that third message of all 0 there (since in this context, this message marks the end of the useful data of the block).
On a fairly low level of the USB stack then, probably, the device takes a long time to respond, maybe longer than 1000 ms (the device has a ti msp430 usb with the manufacturer's usb & hid stack if that helps as an info).
I don't understand why it was decided that 1000 was a good timeout to pick here or why overlapped io is used in the first place (even by the original implementation), given the api is intended to be synchronous.
I vote to revert to the signal11 implementation or remove overlapped io entirely, any reason not to? I will make my own branch replacing that code with signal11's version (or revert to 0.8.0) and keep testing, but at least for my context the old version reproducibly worked better while this version is BROKEN.