jcurl / RJCP.DLL.SerialPortStream

SerialPortStream is an independent implementation of System.IO.Ports.SerialPort and SerialStream for better reliability and maintainability. Default branch is 2.x and now has support for Mono with help of a C library.
Microsoft Public License
631 stars 197 forks source link

RX data lost on Linux platfroms #127

Closed kskog closed 2 years ago

kskog commented 2 years ago

We sporadically lose a few bytes (RX) when runing on Linux platforms (custom Yocto build and Alpine arm64). This happens a few times a day and the nature/pattern of the lost data is the same each time.

I see that an issue maybe related to this have been fixed in commit a2464e1f04ef0d965b85ddfe78eba90b91cb95e3. We're currently running a test based on the v2.x branch to see if this fixes it.

@jcurl Latest release is from April, it would be great if you could build a new NuGet package release with the latest fixes. At least, a patch release is needed with the most important bugfixes.

jcurl commented 2 years ago

Could you please try against a build of this commit and confirm this solves the problem? If you need me to generate a prerelease for this test, please let me know.

I plan to release 2.4.0 when 3.0 is nearly ready while I'm reviewing code. I hope this will be before end of October.

kskog commented 2 years ago

Thanks @jcurl! I will know tomorrow, if the commit fixes the issue. The test-run uses NuGet package version 2.4.1, but compiles c-code lib using latest commit of v2.x branch.

jcurl commented 2 years ago

Thanks for letting me know. I don't expect any race conditions in the libnserial as this is serialized through the .NET implementation.

kskog commented 2 years ago

We get no data loss using the latest commit of v2.x :-) Could you please publish a prerelease/rc?

kskog commented 2 years ago

oops.. was a bit too quick. Got an error event, so it's not fully fixed.

jcurl commented 2 years ago

I'm not aware of any further issues that could cause data loss on receiving. Can you enable logging? You might want to use some software to sniff what the drivers are seeing to confirm that the driver is really seeing the data. It will need to be debugged what the posix api is returning.

kskog commented 2 years ago

Could you make a prerelase build based on one of the latest commits?

The Nuget is now inconsistent with c-code.

The whole serialport just stopped sending data now!

jcurl commented 2 years ago

Can you please provide logs? There are no changes to the C-Code that I've made, so there are no inconsistencies that you mention. You should provide the TraceSource logging with .NET Mono, and you can provide an strace that shows the calls to/from the serial port at the time the problem occurs.

Here's a release based on commit e1b445995a729fccdeb2bf03d95d5312f56503ec.

SerialPortStream.2.4.0-Preview.20211013.zip

kskog commented 2 years ago

Thanks @jcurl. I will try to capture some strace log.

jcurl commented 2 years ago

Hi, any feedback to where data is being lost?

kskog commented 2 years ago

I'm a bit busy right now, but I will let you know as soon as I know more.

jcurl commented 2 years ago

I'm wondering if you've had time to provide more detailed information regarding your problem. There are a couple of things you can do:

  1. I've written a bpftrace program that can be run to monitor the serial port in question. The output can be traced and logged. It's written to work with bpftrace 0.9.4 (if you use a newer version, the buf function can be used to trace the contents of the read/write buffers also). I've attached it as a .zip file and I'm considering adding this to the repository.
  2. Instrument the library libnserial to dump each call for serial_read, serial_write, serial_waitforevent, serial_abortwaitforevent, where extra information can be debugged. There's a #define in libnserial to enable logging (see the build.sh script that's present there).

serial.bt.zip

As this is an open source project done in my spare time, I don't have the resources to debug various systems, and would very much appreciate your help in resolving this issue.

Regards - Jason.

kskog commented 2 years ago

We have been struggling with other issues on the device, so I have not had the time to dig into this, but I will update here immediately when I have the opportunity.

jcurl commented 2 years ago

Closing this issue, as there has been no activity or updates to identify the root cause. Please see the new release 2.4.0 which may contain potential fixes.