nyholku / purejavacomm

Pure Java implementation of JavaComm SerialPort
http://www.sparetimelabs.com/purejavacomm/index.html
BSD 3-Clause "New" or "Revised" License
362 stars 146 forks source link

IOException reading very long response from serial port. #100

Closed GlennED closed 7 years ago

GlennED commented 7 years ago

I have been attempting to use PureJavaComm 1.0.0 as a replacement for RxTx 2.1 and it works beautifully except when attempting to read very long responses from the serial port (ie. 26,885 bytes). In these cases an IOException is thrown that does not provide any detail message as to the cause. I have tried multiple baud rates from 9600 to 57600.

I have since tried to use PureJavaComm 1.0.1 with even less success. In this case the IOException from the serial port is thrown all the time no mater the length of the response from the device. For this version I tried using both JNA 4.0.0 and 4.2.2 since the repository indicated 4.0.0 JNA but the ChangeLog indicates 4.2.2 JNA. Could someone please clarify which version is required? Thanks.

nyholku commented 7 years ago

Basically JNA 4.2.2 is what you want, but AFAIR 4.0.0 should do just as well. I don't think what you are experiencing is in anyway related to JNA version.

Can you post the complete the stack trace of the exception please.

nyholku commented 7 years ago

And which OS are you running your code on?

GlennED commented 7 years ago

Sorry for not thinking to provide this vital information in the first place. :-) I am using the NetBeans IDE 8.1 and Java 1.8 for development. I am developing and currently testing on the Windows 7 platform. This application will also be for the MacOS platform.

Here is the stack trace:

jtermios.windows.JTermiosImpl$Fail at jtermios.windows.JTermiosImpl$Port.fail(JTermiosImpl.java:100) at jtermios.windows.JTermiosImpl.select(JTermiosImpl.java:840) at jtermios.JTermios.select(JTermios.java:474) at purejavacomm.PureJavaSerialPort$2.read(PureJavaSerialPort.java:882) at java.io.InputStream.read(InputStream.java:101) at com.satlantic.dacnet.source.PureJavaSerialSource.readText(PureJavaSerialSource.java:546) at com.satlantic.dacnet.source.ControlSourceReceiveResponse.recvResponse(ControlSourceReceiveResponse.java:245) at com.satlantic.dacnet.source.PureJavaSerialSource.getResponse(PureJavaSerialSource.java:637) at com.satlantic.dacnet.source.PureJavaSerialSource.getResponse(PureJavaSerialSource.java:621) at com.satlantic.dacnet.source.PureJavaSerialSource.recvCommandResponse(PureJavaSerialSource.java:666) at com.satlantic.isus.sunacom.command.SerialCommandInvoker.recvResponse(SerialCommandInvoker.java:199) at com.satlantic.isus.sunacom.command.SerialCommandInvoker.execute(SerialCommandInvoker.java:254) at com.satlantic.isus.sunacom.command.SerialCommandInvoker.execute(SerialCommandInvoker.java:84) at com.satlantic.isus.sunacom.Ancillary.retrieveFirmwareRev(Ancillary.java:309) at com.satlantic.isus.sunacom.ui.controller.SUNAComUIController.retrieveInstrumentInfo(SUNAComUIController.java:679) at com.satlantic.shared.ui.actions.simple.SerialConnectAction.lambda$connectedToPort$1(SerialConnectAction.java:266) at java.lang.Thread.run(Thread.java:745)

nyholku commented 7 years ago

Ok, try setting log level to 5 or 6:

https://github.com/nyholku/purejavacomm/wiki/Trouble-shooting

This should show you/us what is the error code that Windows reports.

Basically the failure happens here:

jtermios.windows.JTermiosImpl.select(JTermiosImpl.java:840)

i.e. in file jtermios/windows/JTermiosImpl.java line 840, Windows returns something unexpected and things go south.

GlennED commented 7 years ago

Here are the log messages immediately before the IOException.

log: 105847,> GetLastError() log: 105847,< GetLastError() => 995 log: 105847,> GetLastError() log: 105847,< GetLastError() => 995 log: 105847,> FormatMessageW(00001200, null, 995, 1024, allocated@0x1e1ac40 (2048 bytes), 2048, null) log: 105848,< FormatMessageW(00001200, null, 995, 1024, allocated@0x1e1ac40 (2048 bytes), 2048, null) => 95 log: 105848,fail() class jtermios.windows.JTermiosImpl line 840, Windows GetLastError()= 995, The I/O operation has been aborted because of either a thread exit or an application request.

jtermios.windows.JTermiosImpl$Fail at jtermios.windows.JTermiosImpl$Port.fail(JTermiosImpl.java:100) at jtermios.windows.JTermiosImpl.select(JTermiosImpl.java:840) at jtermios.JTermios.select(JTermios.java:474) at purejavacomm.PureJavaSerialPort$2.read(PureJavaSerialPort.java:882) at java.io.InputStream.read(InputStream.java:101)

nyholku commented 7 years ago

This is what is happening:

The I/O operation has been aborted because of either a thread exit or an application request.

If the i/o operation has been canceled then inside PJC this can only happen if the port is closed and likewise if this is the internal PJC event handling thread that too is only exited if the port is closed. I would suggest setting a break point in the PJC close method to confirm this theory.

And of course check your own code that it is not closing the port or requesting a thread termination depending on from which thread that call to read happened.

GlennED commented 7 years ago

Thanks for the explanation. It is quite likely that our own code is terminating the thread to cause this error. This does not appear to be happening when PureJavaComm 1.0.0 is being used though. Was there an improvement in 1.0.1 which has led to this behaviour?

I have reverted to using PureJavaComm 1.0.0 with logging level set to 6 and it appears that the issue has disappeared. This could also indicate timing/threading issues with our code......

GlennED commented 7 years ago

For 1.0.0 when the logging level was set to "1" the IOException returned, but with a different cause:

log: fail() class jtermios.windows.JTermiosImpl line 863, Windows GetLastError()= 87, The parameter is incorrect.

jtermios.windows.JTermiosImpl$Fail at jtermios.windows.JTermiosImpl$Port.fail(JTermiosImpl.java:100) at jtermios.windows.JTermiosImpl.select(JTermiosImpl.java:863) at jtermios.JTermios.select(JTermios.java:470) at purejavacomm.PureJavaSerialPort$2.read(PureJavaSerialPort.java:882) at java.io.InputStream.read(InputStream.java:101) at com.satlantic.dacnet.source.PureJavaSerialSource.readText(PureJavaSerialSource.java:546)

nyholku commented 7 years ago

Hi, this (87) is likely same problem, the port is in the process of being closed and so some parameters to Win API are (already) invalid.

The exception handling for read was changed in 1.0.1 to throw always IOException instead of IllegalStateException but otherwise the logic should be the same.

I still think the problem is that something causes your code to close the port and these things we are seeing here are just red herrings.

From the little information you've provided it looks your code is somewhat complex, I would try to create a simple standalone test case to see what is happening. I'm pretty sure PJC is behaving correctly as there are so many users out there that a serious problem would/should have surfaced by now.

Your original post said you problems with long reads (26,885 bytes you mention): there is nothing special with long reads so I suspect that again this is a red herring, it is just that with long reads you are more likely to hit the timing/concurrency issue you are having.

GlennED commented 7 years ago

I was able to use 1.0.0 PureJavaComm source + fix for error code 87 (for Windows) to build a new .jar file that I am now able to use without any problems using our original source code.

I have looked at the ChangeLog for 1.0.1 but I would need more information on all the changes between 1.0.0 and 1.0.1 to understand why version 1.0.1 does not work for our application.

nyholku commented 7 years ago

Glad to hear that.

So can you share what you did?

As to changes between versions gituhub has excellent tools to do that on line or you can download both versions and do a 'diff'.

GlennED commented 7 years ago

I modified the JTermiosImpl class' "select" method to add check for ERROR_INVALID_PARAMETER.

Fix.zip

nyholku commented 7 years ago

Thanks. Hope that does not hide some more fundamental problem...

GlennED commented 7 years ago

Me too. This software has been using RxTx 2.1 for approximately 10 years but we wanted to add support for Bluetooth serial ports on Windows and Mac platforms. Testing on Windows platform with one product works but testing on the Mac platform has not succeeded yet but I need to test again using the modified 1.0.0 library. Do you know if Bluetooth serial ports on the macOS platform has worked for other users of PureJavaComm?

nyholku commented 7 years ago

Hi, never had the opportunity to test Bluetooth serial devices on any platform. I think one issue is that they maybe very slooooow to open which maybe a problem if the open timeout is too tight.

GlennED commented 7 years ago

Hi,

I have located the change implemented in the 1.0.1 version that prevents this version from being used by our software. It is in the windows JTermiosImpl class on line #840 is where the port.fail() method is called.

if (!GetOverlappedResult(port.m_Comm, port.m_SelOVL, port.m_SelN, false)) { if (GetLastError() != ERROR_IO_INCOMPLETE) port.fail(); }

Reverting this change makes the 1.0.1 library work! Can you explain what this change was supposed to fix?

nyholku commented 7 years ago

Good question!

It looks like it was sneaked in here:

https://github.com/nyholku/purejavacomm/commit/7bb52d3674b2df886a4b72617d169f627d493ce1

In retrospect it looks unnecessary and a bit overzealous. Basically if GetOverlappedResult return false then you are supposed to call GetLastError() to find out why so I guess I (?) added this but apparently there are one or more legit return values that in that circumstance which should be ignored and not invoke port.fail().

Can you debug a bit further and see what is/are the error codes you see at that point?

It is true that in that point in code it probably is ok to just ignore the error code but looks a bit sloppy to me.

GlennED commented 7 years ago

Hi, I restored the original source code and debugged it further.

The error code that is happening at that point is ERROR_OPERATION_ABORTED (ie. 995)

nyholku commented 7 years ago

Thanks for debuggin, that makes sense, so we need to add that to the ignore error (no fail) case.

Siebje commented 4 years ago

Did this change ever make it into a release? I'm running into the same problem, and it seems it has not been resolved in 1.0.3.