Linux read time taking considerable time.

gsrunion commented 5 years ago

For the sake of debugging why our application worked well when running on a mac and not so much on our custom linux sbc I updated the linux implementation of jtermios as such..

public int read(int fd, byte[] buffer, int len) {
    Instant before = Instant.now();
    int read = m_Clib.read(fd, buffer, new NativeSize(len)).intValue(); 
    logger.error("jtermios read {} bytes in {}ms", read, Instant.now().toEpochMilli() - before.toEpochMilli());
    return read;
}

public int write(int fd, byte[] buffer, int len) {
    Instant before = Instant.now();
    int wrote = m_Clib.write(fd, buffer, new NativeSize(len)).intValue(); 
    logger.error("jtermios wrote {} bytes in {}ms", wrote, Instant.now().toEpochMilli() - before.toEpochMilli());
    return wrote;
}`

Comparing read and write times of hardware and on dev machine: write read sbc 100ms 134ms dev 1ms . 8ms

Further the write (complete) to full response received time: sbc 191ms dev 49ms

I have tinkered with all the purejavacomm.X environment variables and not seen a significant increase in performance.

Lastly I modified the JTermiosDemo to talk to our attached serial device such that it outputs tx time, rx time, and total interaction time.

On the dev machine a typical timing would be:

tx time:1mS rx time:38mS total time:39

And typical timings on the SBC are as such:

tx time:1mS rx time:27mS total time:28

The latter test shows that I can get near workstation performance out of our SBC such that it likely isn't pure lack of horsepower (with regard to JNA overhead) causing the slow performance. At this point my theory is that it is context switching overhead, as our application is highly threaded, that is causing performance drop on the target SBC. Any suggestions for things to look at?

nyholku commented 5 years ago

As your second test confirmed it is not JNA (it almost never is, especially with the direct mapped calls).

Without knowledge of your application internals it is difficult to suggest where to look into. Some years back I spent considerable time optimising the Linux backend for just such case (Raspberry PI IIRC) and the result was that PJC was as fast as RXTX and pretty fast.

How are you using PJC, with the event system or blocking on read/write on your own threads?

gsrunion commented 5 years ago

@nyholku thanks for the quick response. Our application uses OSGI (felix), Netty, and the PJC Netty adapter found here (https://github.com/steveturner/netty-transport-purejavacomm/tree/develop).

We, actually, found PJC and the Netty adapter to outperform RXTX in our application and clearly prefer the portability of PJC over RXTX. The wrapper implements a 'OIO' blocking transport. I am not keen on the Netty internals enough to know if swapping that for an NIO transport would buy much, but there might be some performance to squeezed out of it there.

nyholku commented 5 years ago

Hmm, so Natty simply does (I pressume) a basically a blocking read() or write() on the InputStream/OutputStream that PJC provides?

That should be about as efficient as it gets as those are very thinly layered calls to native blocking read/write calls, there must be something that has a higher priority (than the Natty internal thread (I assume there is something like that)) and consumes CPU.

You could set a break point to stop everything and see all the threads that are running and perhaps create a piece of code to dump them and their priorities. Also a profiler like VisualVM might help.

gsrunion commented 5 years ago

Hmm, so Natty simply does (I pressume) a basically a blocking read() or write() on the InputStream/OutputStream that PJC provides?

Yes looking at the source the moving parts do appear to be the Input/Output streams provided by PJC.

@nyholku thanks for the input. I will bark up that tree.

nyholku / purejavacomm

Linux read time taking considerable time. #117