Open vegidio opened 1 year ago
Which serial device type are you using?
I suspect it strongly depends on serial device type and/or android vendor. In my home automation I replaced pl2303 with ftdi as these stopped working after some time and only power toggle made them working again
Thanks for the quick answer. The serial device that I'm using is this one: https://ingenico.com/sites/default/files/resource-document/2022-10/OPEN-1500_datasheet_EN_OCT22.pdf
Which driver is used in this library?
Hi Kai,
I am having a similar issue as Veigidio, the device connects but after a random amount of time(time varies with how the device is connected, ie a hub or adapter) it drops a packet and then we get USB get_status request failed or ControlTransfer in test connection fails.
We are using an STM32L4 as a USB CDC ACM device into an Android device.
Thanks, Tj
Hi everyone,
@kai-morich . After months of digging, I think I might have explanation for this issue and the cascading failures.
I'm running a similar issue to @vegidio and @Thaejaesh-S with similar device setups. Baudrate: 115200, Stop Bit 1, Parity None.
The root issue might be requesting a read bulk transfer (from device to host) when there's no data to be sent to the host.
Using strace
on an Android device, I've captured the above ioctl
syscall that mConnection.bulkTransfer
utilizes.
File Descriptor 37 is the serial device communicating with USB host device, which connects to the USB port on the Android device.
I have no issues/errors sending a bulk write request, regardless of payload size.
For some reason the bulk read requests are broken up consistently, starting at the very first byte, then sending the rest of the payload in a subsequent transfer. Then, an additional request, which always times out, and almost always a control transfer request follows. Since the control transfer request generally succeeds, no IOException is thrown, but if the timeout isn't sufficiently long enough, say 2s, I wouldn't be surprised if the IOException would be thrown.
I believe I can reproduce the issue with the get_status request
failed, however, the fact that these ETIMEDOUT
errors are occurring for every bulk read request just seems like a code smell.
Lastly, I don't know the reason behind the segmentation, but I can account for every byte in the bulk read request.
An abatement would be to design around this requesting bulk read transfers of a determinate length using fixed, length-prefixed, or newline terminated messages. Submitting bulk transfer read requests if there's still data to be received. I don't know the logic behind when the bulk read requests are triggered, but at least I can avoid making needless read requests.
I'm going to test using a control transfer to get the status of the endpoint I'm using, then determine whether the transfer to should be made or not. Failing that, I'll re-implement the read method within CommonUsbSerialPort
to read with a determinate length.
If anyone has any feedback, I'd greatly appreciate it.
When using the SerialInputOutputManager
class, a background thread basically always waits here even if no data is currently available. With infinite timeout the read
blocks until data is available or connection is lost, but also works with real timeout and SerialInputOutputManager
repeats the call.
So i doubt that it is an issue to perform read
when no data is available. I recommend to use infinite timeout.
Using an infinite timeout for reads would require an architectural change that I cannot commit to, and asynchronous writes through SerialInputOutputManager
, I believe, are no longer possible.
Unless you're proposing to instantiate another SerialInputOutputManager
instance to the same USBSerialPort
in a separate thread or bypassing the manager completely and making a blocking write request to the raw serial port, I can't think of an acceptably performant solution where polled reads will not block writes, or in the latter case cause undefined behavior. I'd rather avoid creating more threads, if I can help it.
I don't have evidence to support the claim that there isn't an issue with bulk transfer reads when no data is available. I expect bulk transfer requests (BTR) to terminate the end of a complete transfer with a zero length packet or an ACK. If a BTR is made when there is no data to be read, I expect a NAK. Receiving an error code like "ETIMEDOUT" makes me think that the bulk transfer request resulted with a STALL or ERR response from the device.
@kai-morich , do you know of any command line tools where I can view the protocol level USB traffic that this library is using? I'd like to verify what ioctl is sending/receiving from the USB device. Unfortunately, I'm unable to see packet traffic using strace
.
How does the library access the underlying serial port? The FAQ states that the /dev/ttyxxx
ports are not used by the library.
Try with a direct (blocking) write request. If data is smaller than USB packet size you will not notice any blocking.
bulkTransfer
returns -1 on various conditions, no additional details is available from Android USB API. To distinguish some cases I added this testConnection hack.
Without rooting an Android device I am not aware of any possibility to see the low level USB traffic.
There is no underlying serial port. It's all implemented on top of USB requests.
Try with a direct (blocking) write request. If data is smaller than USB packet size you will not notice any blocking.
This is how my application software is currently writing to the USB device. Unfortunately, this doesn't provide a root cause to the failure mode.
To distinguish some cases I added this [testConnection](https://github.com/mik3y/usb-serial-for-android/blob/a9c835bcb07e6e49e346cc1bcd890fabac2161ba/usbSerialForAndroid/src/main/java/com/hoho/android/usbserial/driver/CommonUsbSerialPort.java#L194) hack.
Okay. So to summarize, even though USB protocol states bulk transfer requests should be implemented in some way, the Android USB API simply doesn't translate/provide these error codes properly to the application layer. The same seems to be true for control transfers, even though control transfers are more robust (though slower) compared to bulk transfers.
Now I understand the preference for infinite timeout reads or using requestWait/queue. Because Android's lack of error propagation, we have no option but to poll on a separate thread and hope there aren't any other failures Android is hiding, and if any exceptions are thrown, somehow restart the UsbDeviceConnection to return to nominal operation.
This would explain why some individuals implemented their own ioctl function to support bulk transfers. This is tough. :(
Without rooting an Android device I am not aware of any possibility to see the low level USB traffic.
Yeah. I've hit that wall today. Unless I can install some sort of special kernel module, which is its own rabbit hole rife with dead-ends, there's no way I can view USB traffic. I'll see about getting a USB packet analyzer. Unfortunately, a logic analyzer will not work for setup due to its parasitic load.
I was wrong about ETIMEDOUT. According to the ancient tomes of Android hosted docs, it's a legit error code due to an arbitrary assigned timeout. The Host controller would experience/see no other error.
Just a pulse update. I've got the USB protocol analyzer.
I've had to shift gears for a separate project, but I'll be back on this in the new few days or so.
Hi @jhlink @kai-morich Any update on "USB get_status request failed" and "control transfer failed" as we still get these error in the latest library though after few hours and when keeping the device idle and initiate a command.
@Mohammedbinnazar , I should be back on this later this week. Hopefully, today once I dial in the SW changes for unrelated dev.
Which driver is used in this library?
@kai-morich I saw now that you posted a follow-up question, but I didn't respond. I apologize for that, but shortly after I opened this issue I had a serious family emergency that kept me away from work, and computers in general, for a couple of months.
In any case, I'm back now and it seems that there are more people facing the same type of problem that I am. So, to answer your question and give more hints to anyone following this issue and hopefully we can find a solution, the driver that I'm using is CdcAcmSerialDriver
.
Thanks again for your assistance.
Ok, now that I'm back investigating this I found something yesterday that might give some hints of what's happening, at least in my case.
First I would like to share a piece of my code:
val driver: UsbSerialDriver by lazy {
val allDrivers = UsbSerialProber.getDefaultProber().findAllDrivers(usbManager)
if (allDrivers.isEmpty()) error("No serial USB devices!")
// My USB device is from Ingenico
allDrivers.first { it.device.manufacturerName.equals("ingenico", ignoreCase = true) }
}
and then I get the UsbDevice
simply doing this:
val device get() = driver.device
The code above works great when I start using my app, but when it stops responding (I've put some code to detect that), I try to reconnect. However, this time the same code above to find the driver associated with my USB device doesn't find any result for manufacturerName = Ingenico
.
The list allDrivers
in the code above is not empty (I have 2 USB devices connected to my Android device), but the first time I run this code, the list has 2 devices (Ingenico and another one). However, when Ingenico stops responding, the same code returns only 1 device (not Ingenico).
So, is it safe to assume that whatever is causing the loss of connectivity with the device is not a problem in the source code? I mean, when I lose connectivity, I cannot even try to reconnect to the device because it no longer appears in the list of available drivers, but the other USB device is still there. It gives me the impression that my Android phone and my code are working as expected; it's the USB device that has entered an unrecoverable state somehow.
Hi @jhlink Hopefully you must have started working on this project last week. Any updates on it ?
Folks, I want to find a solution for this problem like everyone else in this thread, however let's not act as if this is a paid service and the people working on this own us anything.
I'm sure that whenever a solution is found an update will be posted here.
@kai-morich Status update. I spent the day reproducing the failure with a USB protocol analyzer. I seem to have found a fix for the issue, but @vegidio it would be great if you could test this to prove that it works. (Hope you're family is doing okay. )
Reproduced the failure using the example project in the repository. It was modified to force the failure through heavy read/write throughput operations. An IO exception occurred within seconds, which indicated packet corruption due to concurrency issues. Using synchronized
on the USBSerialPort instance within the serial write thread seems to have resolved the issue. Over 50GB of USB traffic data was collected using the USB protocol analyzer given the following:
After applying the modifier, this setup was run for 45+ minutes with no other connection loss or failures observed on the application, USB traffic, or on the device.
@vegidio , could you apply the synchronized flag on your UsbSerialPort
instance in the thread that's writing to the serial port and observe if the failure appears for you?
Initially, I used the USB Protocol Analyzer (USBPA for short) on the legacy software I'm maintaining. I was able to reproduce the USB get_status_request
issue, but the traffic indicated that during a bulk transfer of a 300 byte payload, a baud rate configuration control transfer was made; it contained the SET_LINE_CODING
code found in setParameters
. 2.5 seconds later, which is the read timeout I've specified, a series of SETUP request failures occurred before succeeding about 2ms later. This is the GET_STATUS request that's being sent in the test_connection
function. The USBPA identified these packet failures as an invalid PID sequence because of the following:
Despite the invalid PID error above, the root cause for the unexpected SET_LINE_CODING control transfer is likely due to a bug or logical failure in the application, not in the library or the hardware layer.
Summary: Do not call setParameters
during ongoing traffic, which should be a self-evident principle in any communication protocol. With respect to the library, it is odd why setParameters
can only be set after the UsbSerialPort is opened. I'm guessing the reason is to ensure maximum device compatibility using a predefined, default set of values to open the device before changing it later. It would be nice if the "setParameters" can be called before "open," such that the desired baud rate and port configuration is used when the port is opened or if the configuration can be part of the open
function header.
Forcing the developer to change port configs (if the default 9600 baud doesn't apply) after port opening seems to allow for a category of software bugs.
In order to prove there is a software bug in the library, I've modified the example project in the library. I've removed everything UI related to test the core issue. A dedicated HandlerThread was created for writing data to the UsbSerialPort
and handling the data received from OnNewData
.
Without incorporating the synchronized
modifier on the UsbSerialPort instance, I experienced a java.io.IOException: Error writing 19 bytes at offset 32 of total 51
, which indicated that packet corruption was occurring due to concurrency issues.
After adding the modifier and locking inter-thread write access to the UsbSerialPort, I didn't experience this issue even after aggressively high, bidirectional throughput.
Summary: UsbSerialPort is not thread-safe. If read/write operations are placed in a separate thread, a mutex lock will be required on the port itself (not their buffers) to avoid undefined behavior caused by concurrency issues. Reviewing the example code for SimpleUsbTerminal, I've observed a similar usage of synchronized
within Serial Reads/Writes as data is being received or sent across threads.
An argument can be made that a mutex lock may not be necessary if there is only one thread that handles serial writes. Given a bona fide RTOS, I'd agree, circumstantially. However, Android is not an RTOS, and the OS can kill/restart any thread at any time unless bound to a service or tied to the UI. Given that I, as the developer, cannot predict or guarantee thread termination/restarts, a mutex lock is required.
Preliminary tests indicate that the failure is due to concurrency issues in accessing UsbSerialPort, which is apparently not thread-safe. The issue isn't necessarily a software bug in Android's API, although greater detail in error propagation in the USB API would alleviate the need to buy a $3k+ tool to root cause USB failures.
If any developer is splitting read/write operations into dedicated threads, they should take care to incorporate mutex locks. Otherwise, if the developer is handling all read/write operations on the main thread, the application won't see failures due to concurrency issues, though there will likely be failures due to high load on the main thread.
synchronized
to avoid undefined failures. open
such that the port configuration can be applied when a port is opened. Eliminates the possibility of calling setParameters
during ongoing USB traffic. I'll incorporate the synchronized change in legacy code, and after a ground-up redesign, I will test the overall solution to see if the get_status
request will occur.
I'll wait to see if proposed change works for @vegidio .
Thanks for this comprehensive analysis.
'write' is internally serialized with mWriteBufferLock so you found issues with concurrent Connection.controlTransfer? Do these only happen for setParameters that might reconfigure the serial device or also for other methods like set DTR?
Concurrent read and write should still be possible. Did you face issues with a single write thread and another (blocking) read thread like SerialInputOutputManager?
@jhlink Thank you very much for looking into this. I will test on Monday and come back with a feedback soon!
Thanks for this comprehensive analysis.
Of course. @kai-morich
'write' is internally serialized with mWriteBufferLock so you found issues with concurrent Connection.controlTransfer? Do these only happen for setParameters that might reconfigure the serial device or also for other methods like set DTR?
Yes, it seems that way, though I don't know what the desired behavior of mWriteBufferLock
is, assuming that you're talking about SerialInputOuputManager.
Is it to assure USB requests are added sequentially? Then, yes. mWriteBufferLock
would achieve this.
Would it limit access to the USB Serial port? No, not if there were more than one thread. I don't see anywhere in the library source code where two dedicated threads for reads and writes are being created. In this scenario, the write
to the UsbSerialPort (not writeAsync provided by SerialInputOutputManager) is handled separately in a dedicated thread external to SerialInputOutputManager.
Presently, I can see that if reads and writes were executed using SerialInputOutputManager, then access control is built into the flow of control; one can't run without the other completing as they alternate.
I haven't used methods like set DTR, as it seems these are dedicated pins outside of D+/D- pins. I would imagine this would mitigate any issues since data flow direction is known.
As for setParameters
, I would have to repeat the experiment to prove consistency since I've only tried it once.
Yes. It seems that sending the control transfer during a bulk transfer causes cascading failures.
I don't have the traffic logs with me, but IIRC the behavior is as follows.
SET_LINE_CODING --> SPLIT/SETUP/DATA0 Transaction - Sent to Device BULK_TRANSFER - 300 BYTES --> After Xth bytes sent --> The packet before this next packet fails with an invalid PID sequence. --> SPLIT/SETUP/DATA1 Transaction is sent to Host --> The rest of the bulk transfer continues until completion. BULK_TRANSFER -- Continue... (2.5seconds later) --> GET_STATUS failures.
Concurrent read and write should still be possible. Did you face issues with a single write thread and another (blocking) read thread like SerialInputOutputManager?
Do you have an example project that demonstrates that both are possible without mutexes on the UsbSerialPort? I can run the example code on Monday and collect/post the USB traffic logs for you here. I can also share the example code that I've written on Monday, and it behaves like you've said, granted with timeouts for both read and write. 2.5sec is a very long time, though admittedly increasing that timeout to 3 or 4 seconds works. Though I'm hesitant on making that jump without assurance that the timeout will not need to increase again in the future. Depending on the implemented communication protocol, the packet timeout for a communication spec may not allow for 3 or 4 seconds.
I'm unaware if the write thread was blocking the read thread. Thread access isn't clearly shown in the USB traffic logs. However, regarding the application code, there are problems with it that are unrelated to the library. It wasn't implemented without an understanding of multithread environments. It could be that there are two write threads or poor logic that directed reinitializing the dependencies for SerialInputOutputManager.
@jhlink Ok, I did the change that you suggested and used synchronize in all places where UsbSerialPort
reads or writes something.
Before this change, my app could run for around 16 hours before losing connection with the USB device, but there was a case where it ran for almost a full day (23 hours to be precise) before it lost connection. However, this happened only once; most of the time the connection is lost around the 16 hour mark.
Now, with the change that you suggested, my app is already running for 24 hours without losing connection, which is 1 hour more than the max amount of time it ever remained connected 🎉
I'm feeling carefully optimistic, but I think you found the source of the problem. I will let my app run for 24 hours more and I will report again tomorrow. 🙂
That's wonderful @vegidio !
I'm feeling carefully optimistic, but I think you found the source of the problem. I will let my app run for 24 hours more and I will report again tomorrow. 🙂
Thank you, but we're not out of the woods yet. Would you be opposed to running it for 48 hours continuously instead of 24?
If we assume mean time to failure is approximately 23 hours, I'd like to try for 48 hours for at least two opportunities that the failure could occur again.
Ideally, if the test time were much longer, I'd be able to provide a high degree of confidence in reliability, say running it for a week straight, but I can settle for 48 hours if this isn't convenient.
I can definitely leave my device on for much longer. However, I hate to write this, but less than 2 hours after my last update I lost connection with the USB device 😢
There are only two places in my code where I call UsbSerialPort
read and write functions and I added synchronize to both places, but I will go through the code again to see if it benefits from adding synchronize in other areas. In any case, thank you very much for the support you have provided so far.
@vegidio , I'll share my example code that forced the issue tomorrow.
Could you review it and see how your code differs from mine?
My guess is that your code is proprietary and can't be shared, which leaves us to replicate it through contrived example code.
@jhlink Yes, you're right, the project that I'm working on is proprietary. However, the main class that makes the connection with the USB device and also read/writes data, comes from an open source Android library that I created to help me during the development of my projects.
You can find this class called UsbHelper
here: https://github.com/vegidio/android-sak/blob/main/util/src/main/kotlin/io/vinicius/sak/util/UsbHelper.kt
You will notice that I have a variable port: UsbSerialPort
in this class, and this variable reads/writes data in the functions read()
and write()
; I added the annotation @Synchronize
to both functions.
But just to make sure that I was covering all possible edge cases, I also added @Synchronize
to the functions connect()
and disconnect()
because even though they don't read or write any data in the USB device, they also access the port: UsbSerialPort
variable.
Sorry, but this is the only part of my code that I can share. However, if you could share your example I will compare with the rest of my project and see if there's something else I'm missing.
Terminal style applications like the example at the library or Serial USB Terminal do write
in the UI thread and read
in a background thread created by SerialInputOutputManager
.
With this pattern, read
calls are done sequentially in the background thread and write
contains synchronized (mWriteBufferLock)
. So far I did not face issues of both threads doing concurrent USB calls.
Your aditional lock on write
should not have an additional effect. Do you read
from different threads concurrently?
What is the actual read timeout your are using? With too small timeouts I got strange errors.
What is the actual read timeout your are using? With too small timeouts I got strange errors. @kai-morich , 2500 ms is the read timeout.
Sorry. I should state the context for the reads and writes. I've been basing my conclusions on the library source code. After I post the example project in an hour or so, I think we can align on an implementation to replicate the failure.
@vegidio , connect
does interact with the USB port. If you open
the port, there are at least two control transfers being sent (as the library source code indicates), and I think one control transfer in SetParameters
, which updates the port configuration (baud rate to 115200). I haven't looked into disconnect
.
I don't have experience in Kotlin, but based on the general approach, do you know if you've type-aliased Synchronize
to kotlin.jvm.Synchronized
? Looks like the annotation has been deprecated. Seems like concurrency is handled natively in Kotlin, but I don't know to what degree.
[Edit] Yeah. No worries. I understand getting outside help on proprietary code is difficult. We can work around it. :)
@jhlink You're right, @Synchronized
is indeed deprecated on Kotlin, but only in the Common, JS and Native versions:
However, if you unselect the other Kotlin platforms and leave only JVM, you will see that it's still supported there:
There are actually two ways to use synchronize in Kotlin: we can use the @Synchronized
annotation and synchronized
block. I won't try to explain the difference between both, so I will just post a link of how ChatGPT explained the difference: https://chat.openai.com/share/94e04bb4-c19d-494e-9406-116cd172020b 😄
In my first attempt I tried to use @Synchronized
(the annotation), but I will try to use the synchronized
block now and see if I have a better control of how UsbSerialPort
is used.
Besides that, after your suggestion I started to read more about how synchronized
works with Kotlin Coroutines (which I use in my project) and I found an article mentioning that they don't always work well together: https://blog.danlew.net/2020/01/28/coroutines-and-java-synchronization-dont-mix/ - I will see if I get better results following the approach suggested in this article.
@vegidio I see. Just wanted to make sure the compiler wasn't ignoring the Synchronized flag. I operate in the world of C, with occasional excursions into Python script land and begrudging dabbles in Java la-la land.
@kai-morich , Here's the example project. A stripped down version of the UsbSerialExample but with two handler threads for each read and write. To be clear, the reception of the read is handled in a separate thread. The SerialIOManager isn't initialized in a separate read thread.
If you have any suggestions on forcing the issue, I can run the tests and gather USB traffic.
I have a similar problem now, after calling the UsbDeviceConnection controlTransfer function, it keeps blocking despite setting timeout. This is the issueTracker I submitted https://issuetracker.google.com/issues/298700517
The linux kernel of the android device I am using now is 5.4.19, there is a problem, the usb driver code of linux is submitted, it seems to be caused by this reason, everyone take a look https://github.com/torvalds/linux/commit/ae8709b296d80c7f45aa1f35c0e7659ad69edce1 https://lore.kernel.org/all/20210903175312.GA468440@rowland.harvard.edu/
Well. That's a very illuminating and useful share, @notcaremath .
Thank you.
The android os and kernel I'm using is v11 on kernel 5.4. The USB control transfer method uses this function, the function header of which says... that it shouldn't be used in an interrupt context or seems to imply that it shouldn't be used in a multithreaded context.
/**
--
 | * usb_control_msg - Builds a control urb, sends it off and waits for completion
 | * @dev: pointer to the usb device to send the message to
 | * @pipe: endpoint "pipe" to send the message to
 | * @request: USB message request value
 | * @requesttype: USB message request type value
 | * @value: USB message value
 | * @index: USB message index value
 | * @data: pointer to the data to send
 | * @size: length in bytes of the data to send
 | * @timeout: time in msecs to wait for the message to complete before timing
 | * out (if 0 the wait is forever)
 | *
 | * Context: !in_interrupt ()
 | *
 | * This function sends a simple control message to a specified endpoint and
 | * waits for the message to complete, or timeout.
 | *
 | * Don't use this function from within an interrupt context. If you need
 | * an asynchronous message, or need to send a message from within interrupt
 | * context, use usb_submit_urb(). If a thread in your driver uses this call,
 | * make sure your disconnect() method can wait for it to complete. Since you
 | * don't have a handle on the URB used, you can't cancel the request.
 | *
 | * Return: If successful, the number of bytes transferred. Otherwise, a negative
 | * error number.
 | */
Speculating here, but this may explain why the synchronized qualifier seemed to resolve the issues I've seen. The bulk transfer requests are synchronous, and I believe performing an additional synchronous transfer, be it a control or bulk during an existing transfer, would result with undefined behavior.
I think this might be why developers that have opted to implement asynchronous requests using queue and requestWait don't see this issue. Speculating here, but guessing that queueing USBRequests would result with a call to proc_do_submiturb
, which eventually leads to a call to usb_submit_urb
, which is similar to what's being called here usbfs_start_wait_urb
in Android 13, Kernel v5.15, admittedly with some 200 ms delays.
I think there are 4 paths ahead.
That patch was applied in kernel v5.15.x, which is used in Android 13 and higher. The highest kernel version available in Android 12 is 5.10.x. The highest kernel version available in Android 11 is 5.4.x.
@vegidio @kai-morich Could you share what android OS and kernel versions you're using?
@vegidio , if you're on Android 13 with kernel v5.10 or lower, could you update your kernel to v5.15.x and test with the same synchronized
qualifier software you ran originally? If we assume that this issue occurs in any kernel version 5.10 or earlier that operates in a multi-threaded environment, stands to reason that this issue will be resolved.
in the library
queue + requestWait
is only used for read
with infinite timeout,bulkTransfer
is used for read
with timeout and write
,controlTransfer
is used for baud rate, control lines, ...As I use read
with infinite timeout in my terminal App, there is typically only a single concurrent bulkTransfer
or controlTransfer
done by the foreground thread. I have to create a test program that does multiple blocking calls concurrently.
I use a pool of Android devices from 4.3 to 13, but never checked the kernel version, neither observed different behavior. As various of the devices run CyanogemMod or LineageOS they come with more plain Linux + Android and might behave differently than Android versions overoptimized by big vendors.
It is speculated that such a problem is more the unreasonable implementation of the USB protocol in the usb terminal, such as processing usb control commands in the hardware interrupt, and there is no timely response. The host is blocked.
in the library
queue + requestWait
is only used forread
with infinite timeout,
Ahh... yes. Was there a reason the timeout for requestWait wasn't supplied? Since API lvl 26, requestWait has that option .
As I use
read
with infinite timeout in my terminal App, there is typically only a single concurrentbulkTransfer
orcontrolTransfer
done by the foreground thread. I have to create a test program that does multiple blocking calls concurrently.
Okay. Feel free to use the example project I posted previously.
I use a pool of Android devices from 4.3 to 13, but never checked the kernel version, neither observed different behavior. As various of the devices run CyanogemMod or LineageOS they come with more plain Linux + Android and might behave differently than Android versions overoptimized by big vendors.
Were these on emulators or on actual hardware? I don't think this issue would appear on the former. @kai-morich
@notcaremath I do not understand your comment. Are you referring to Kai's USB Terminal app or the USB/core implementation in the android kernel?
Since API lvl 26, requestWait has that option .
requestWait
with timeout has other issues and as I anyway need the old implementation for older devices I did not switch or investigate if it got better with more recent Android versions: https://github.com/mik3y/usb-serial-for-android/blob/34e6d989fdf8d542981c0d1ad9c93027c9be7063/usbSerialForAndroid/src/main/java/com/hoho/android/usbserial/driver/CommonUsbSerialPort.java#L186
Were these on emulators or on actual hardware?
All real devices.
Hey folks, sorry for the late response, but here is the info about my setup: I'm running Android 8.1.0, with kernel 4.4.95, so it seems to fall into the version range that can be problematic according to your theory?
Unfortunately, upgrading the Android version/kernel is not an option in my case. But I will see if I can test this application in a device with a higher Android version/kernel and check if I get different results.
Thanks!
Hey folks, sorry for the late response, but here is the info about my setup: I'm running Android 8.1.0, with kernel 4.4.95, so it seems to fall into the version range that can be problematic according to your theory?
@vegidio No worries. Glad to hear you're still with us. Yes. I think that the pre-5.14.x kernel method of handling bulk/control transfers involve submitting requests that cannot be terminated by the kernel, that the request must be resolved with a response before the next request can be processed regardless of any provided timeout.
One possible avenue is testing and expanding the queue/requestWait for reads and writes.
Though, I'm puzzled how @kai-morich wasn't able to run into any of these issues when testing on real devices. Kai, could you provide more details about how your testing on these devices? Could you share the test plan/process for testing on these devices?
@Everyone I'm aware of the substantial effort involved, and I genuinely appreciate your hard work. If you could kindly share an estimate for when we might anticipate the release of the updated and fixed Android USB serial port connection library, it would be tremendously valued. @jhlink @kai-morich @vegidio
@Mohammedbinnazar The current proposal is to use synchronized
on UsbSerialPort for read/writes and that extreme care should be taken when using this library in a multi-threaded context. My theory is that the bug stems from the application not from the library, as indicated in the USB traffic. Development and testing is still underway.
It seems the bug would not occur in Android 13 Kernel 5.14.x+ per the source code. I have no test data to support this, nor do I intend to gather it as it is out of scope for me.
The only likely release/change from this PR would be an update to the README/FAQ about best practices of the library in multi-threaded (Java) contexts using Synchronized (or at the very least a "Do Not Do" list) and maybe cautionary warnings for corresponding use in Kotlin, depending on what @vegidio discovers in his test.
Based on my investigations thus far, the library is sound and stable. Improvements could be made like anything else but not essential. As in C, I believe the library should be kept simple, small, backwards compatible, and fundamental. The onus is on the developer to understand then utilize the tool in any capacity, even if it means allowing the developer to self-inflict irreparable failure.
@vegidio , can you tell me what USB speed your target device is operating? Is it Low/Full speed or High speed?
One way to check is connecting your device by USB to your Mac/PC and determining the speed from the Device Manager.
@kai-morich .
An update. Sorry about the long haitus. Got pulled in a couple different directions.
The Android application is communicating with a USB device through a Usb 2.1 USB Hub. After analyzing the traffic, the hub would respond to the USB host with SPLIT-NACKs.
The USB specification is very clear about the cause of this. The Transaction Translator buffer in the USB Hub/Chip is busy/full and results with a SPLIT-NACK. Why the transaction translator is busy or fills so quickly remains to be seen as data traffic and speed is low at best.
I believe this is a hardware issue with the USB hub chip itself. For myself, I'm using a Genesis logic controller.
I suggest sharing a warning in the Readme that certain Genesis USB chips are defective or have faulty logic for handling the transaction translator, and it's best to use a different USB Chip manufacturer.
Unfortunately, the Android app does not provide a means to detect android USB hub devices. If the USB chip cannot be changed, then the following would probably work.
Unfortunately, I'm not sure if this requires rooting the device.
Beyond that, there's nothing else that comes to mind that could reset the hub/device to recover from the filled buffer beyond power cycling the USB device, if that's even possible in software.
Okay. I'm finally rounding out this thread. @kai-morich , I motion closing this issue. :) I'd like to put this issue to rest, please.
I wrote a C module using libusb and compiled a JNI library using NDK-build. I'm nearly there, but I don't recommend this approach. The effort is herculean, requires a deep understanding of low level USB protocol, USB Spec, (or at the very least a USB protocol analyzer to see USB traffic) and assumes root access to Android + ability to modify the boot.img partition.
The following is a solution to "recover" from a busy buffer aka 'get_status madness.' It doesn't resolve the root cause which is likely due to a non-standard USB-spec implementation on the USB IC but not limited to electrical issues.
If you experience these "usb get-status failed" issues, the easiest solution is to junk that USB device and find a different device with a battle-tested USB Logic implementation. Ideally, something like a certain TE USB interface/controller, where USB logic is handled through discrete state machine logic instead of error prone firmware.
If you can't do this and possess root access + ability to modify boot.img of the Android OS, then feel free to proceed.
After building the JNI module, I ran into access permission issues like many other libusb developers encountered years ago. The JNI module/Android app failed to connect to the USB device directly despite being able to connect to the device via standalone libusb C program installed and executed through ADB shell (as root).
Elevating the android app to "System" and changing the entire /dev/bus/usb
path to System
user friendly permissions aren't sufficient. Starting at some Android release, SELinux policies were enforced, denying unauthorized access from different contexts, like system Apps, regular apps, shell, root, etc. The SELinux policies had to be modified to grant the desired USB access to a desired app. This blog was instrumental in identifying existing policies and testing policy changes.
From there, the path is relatively straightforward. Now that the Android app can communicate directly with the USB device, a TT RESET Control transfer request, described in the USB Spec, can be sent to the USB device to recover the 'usb get_status` issue.
Thank you @kai-morich for your contributions and continued maintenance. I hope this investigation provided closure. It certainly has for me, accompanied with a brand new appreciation and heightened skepticism for USB boards.
Hi, I'm developing an application that connects to a device through USB and then sends a command (the same command) over and over again in intervals of 30 seconds.
Everything seems to work as expected for some hours, sometimes almost 24 hours working without any problem, but then we suddenly start to get this error
USB get_status request failed
. I searched for this error message in your repo and I found some people that encountered the same error message, but two cases in particular seem somewhat similar to mine:In this first issue the OP reported that everything was working initially but later it stopped after 2 hours; on this second issue the OP mentions that everything stopped working after 5 hours.
I find intriguing that in all theses cases, including mine, everything is working for a while, but then out of nowhere it stops working after some hours. I know this is a long shot, but do you have any idea what could be causing these problems?
Thanks!