jnr / jnr-unixsocket

UNIX domain sockets (AF_UNIX) for Java
Apache License 2.0
280 stars 75 forks source link

UnixServer example doesn't work on Mac with either nc or socat #82

Closed zonkhead closed 4 years ago

zonkhead commented 4 years ago

I'm trying to make a server using the UnixServer example as a starting point. On my Mac, when I send more than two buffers worth to it, it just hangs and no longer reads more data. Here are two example commands that should work (and do work on Linux).

➜ ~ cat .emacs | socat UNIX-CONNECT:/tmp/fubar.sock - ➜ ~ cat .emacs | nc -U /tmp/fubar.sock

All I'm doing is running the UnixServer class straight out of the box. I'm using version jnr-unixsocket 0.28.

The funny thing is that if I make the ByteBuffer smaller than 1024 (like 512), it hangs after just 1 buffer read. All buffer sizes work fine on Linux.

headius commented 4 years ago

Hmm ok seems like a problem with the UNIX socket subsystem... but only on Darwin?

Can you make a simple script or test that shows the problem?

zonkhead commented 4 years ago

You already have it. Your UnixServer class. Invoke it with the script lines above. Maybe both socat and netcat don't work well with unix sockets on Darwin. Probably not but I can't be certain.

headius commented 4 years ago

Ah I understand now. Will investigate.

headius commented 4 years ago

I did not have a socat command on my MacOS machine, so I assume you installed that separately.

With socat (from homebrew) it appears to write 1024 bytes and then hang here:

"main" #1 prio=5 os_prio=31 tid=0x00007fa192004000 nid=0x1a03 runnable [0x0000700000b5f000]
   java.lang.Thread.State: RUNNABLE
    at com.kenai.jffi.Foreign.invokeL6(Native Method)
    at com.kenai.jffi.Invoker.invokeL6(Invoker.java:455)
    at jnr.enxio.channels.Native$LibC$jnr$ffi$1.kevent(Unknown Source)
    at jnr.enxio.channels.KQSelector.poll(KQSelector.java:165)
    at jnr.enxio.channels.KQSelector.select(KQSelector.java:145)
    at jnr.unixsocket.example.UnixServer.main(UnixServer.java:47)

I don't see why it hangs here, but I did notice that socat sends data in 8196-byte chunks by default. If I change that to 1024-byte blocks, it gets further... 8 blocks successfully transit the server, and then the server exits with a "Broken pipe" error indicating the client has gone away.

For my test, I used a file that's 11423 bytes long, so I would expect to see that much data transit the server.

So two questions out of this:

headius commented 4 years ago

Suspecting this might be an interaction between jnr-unixsocket and socat I thought I'd play with the UnixClient we also have in examples.

With only 9 bytes written, it works fine.

If I modify it to send 9000 bytes, with a loop to read everything using the same 1024-byte buffer, it gets stuck after two 1024-byte buffers have been filled.

At that point, the server is in the same place it is for socat with the client stuck here:

"main" #1 prio=5 os_prio=31 tid=0x00007ff309805000 nid=0x2303 runnable [0x00007000011c7000]
   java.lang.Thread.State: RUNNABLE
    at com.kenai.jffi.Foreign.invokeN3O1(Native Method)
    at com.kenai.jffi.Invoker.invokeN3(Invoker.java:1061)
    at jnr.enxio.channels.Native$LibC$jnr$ffi$1.read(Unknown Source)
    at jnr.enxio.channels.Native.read(Native.java:115)
    at jnr.unixsocket.impl.Common.read(Common.java:51)
    at jnr.unixsocket.impl.AbstractNativeSocketChannel.read(AbstractNativeSocketChannel.java:72)
    at jnr.unixsocket.UnixSocketChannel.read(UnixSocketChannel.java:253)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:59)
    - locked <0x000000076d9d4c78> (a java.lang.Object)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
    - locked <0x000000076daa2bd0> (a sun.nio.ch.ChannelInputStream)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    - locked <0x000000076daa2b40> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.Reader.read(Reader.java:100)
    at jnr.unixsocket.example.UnixClient.main(UnixClient.java:55)

The client appears to be stuck reading from the server, while the server is stuck waiting for more data... even though we've dumped 9000 bytes on the wire!

Here's the patch for the client:

diff --git a/src/test/java/jnr/unixsocket/example/UnixClient.java b/src/test/java/jnr/unixsocket/example/UnixClient.java
index 3bdfc6c..aaafafc 100644
--- a/src/test/java/jnr/unixsocket/example/UnixClient.java
+++ b/src/test/java/jnr/unixsocket/example/UnixClient.java
@@ -42,6 +42,7 @@ public class UnixClient {
             }
         }
         String data = "blah blah";
+        for (int i = 0; i < 1000; i++) data += "blah blah";
         UnixSocketAddress address = new UnixSocketAddress(path);
         UnixSocketChannel channel = UnixSocketChannel.open(address);
         System.out.println("connected to " + channel.getRemoteSocketAddress());
@@ -51,17 +52,19 @@ public class UnixClient {

         InputStreamReader r = new InputStreamReader(Channels.newInputStream(channel));
         CharBuffer result = CharBuffer.allocate(1024);
-        r.read(result);
-        result.flip();
-        System.out.println("read from server: " + result.toString());
-        final int status;
-        if (!result.toString().equals(data)) {
-            System.out.println("ERROR: data mismatch");
-            status = -1;
-        } else {
-            System.out.println("SUCCESS");
-            status = 0;
+        while (r.read(result) > 0) {
+            result.flip();
+            System.out.println("read from server: " + result.toString());
+            result.clear();
         }
-        System.exit(status);
+//        final int status;
+//        if (!result.toString().equals(data)) {
+//            System.out.println("ERROR: data mismatch");
+//            status = -1;
+//        } else {
+//            System.out.println("SUCCESS");
+//            status = 0;
+//        }
+//        System.exit(status);
     }
 }
headius commented 4 years ago

Ok I think I have some answers. I'm not sure it's a bug, but it's an explanation of what we're seeing here.

Because the UnixClient seemed to also hang in a read, I suspected that the server was only seeing a partial view of the content. I modified the ServerActor to not just read 1024 bytes, but to read as many bytes as it can before getting a "0" return value.

The result is that the server successfully reads and writes all 9000 bytes from my modified client.

Heres the patch:

diff --git a/src/test/java/jnr/unixsocket/example/UnixServer.java b/src/test/java/jnr/unixsocket/example/UnixServer.java
index a70a924..787f4f6 100644
--- a/src/test/java/jnr/unixsocket/example/UnixServer.java
+++ b/src/test/java/jnr/unixsocket/example/UnixServer.java
@@ -104,16 +104,20 @@ public class UnixServer {
         public final boolean rxready() {
             try {
                 ByteBuffer buf = ByteBuffer.allocate(1024);
-                int n = channel.read(buf);
-                UnixSocketAddress remote = channel.getRemoteSocketAddress();
-                System.out.printf("Read in %d bytes from %s%n", n, remote);
+                int n;

-                if (n > 0) {
-                    buf.flip();
-                    channel.write(buf);
-                    return true;
-                } else if (n < 0) {
-                    return false;
+                while ((n = channel.read(buf)) > 0) {
+                    UnixSocketAddress remote = channel.getRemoteSocketAddress();
+                    System.out.printf("Read in %d bytes from %s%n", n, remote);
+
+                    if (n > 0) {
+                        buf.flip();
+                        channel.write(buf);
+                        buf.clear();
+//                        return true;
+                    } else if (n < 0) {
+                        return false;
+                    }
                 }

             } catch (IOException ex) {

This change also fixes the socat example; the file I pipe to it now completely transits the server. And just for completeness, I confirmed that your nc example also completes successfully.

I think what we're seeing here is a bad interaction between IO buffers (at either the JVM or kernel level) and the poll call used for IO select here. On the server side, it seems the poll for read is not seeing data left "on the wire" after a subsequent read event has fired. As a result, we eventually end up with some number of bytes "in limbo" and no poll events left to trigger the server to read those bytes. I don't think this constitutes a bug in jnr-unixsocket, since select, read, and write all just bottom out in the system's poll, read, and write native calls.

It's possible that we're not configuring the buffering for the unix domain socket file descriptor properly, but we would need to research that. We're not doing anything unusual when setting it up, so I would expect the basic unix socket to work properly with poll.

I will commit this change to UnixServer for you to test. I am not entirely satisfied with this as a "solution" so perhaps you can help me figure out why we're seeing this buffering behavior?

headius commented 4 years ago

With the UnixServer working properly now on Darwin, I'm going to close this issue.

From discussions and articles online, it appears this may be just one of the "quirks" of using poll across platforms. It does not appear that additional POLL_IN events get triggered for unread data that happens to be lying around in a kernel buffer, so code that responds to a READ select should attempt to read as much data as is available before doing another select.

headius commented 4 years ago

Releasing today in 0.29.