kohlschutter / junixsocket

Unix Domain Sockets in Java 7 and newer (AF_UNIX), AF_TIPC, AF_VSOCK, and more
Apache License 2.0
438 stars 114 forks source link

Messages over 1MiB stop all communication for a NIO UDS #118

Closed TW-Goldencode closed 2 years ago

TW-Goldencode commented 2 years ago

Describe the bug: Sending a message larger than 1MiB from server to client results in the following infinite loop:

  1. The client blocks at read()
  2. The server can call write(), but it keeps returning 0 , while the transferred byte count stays at exactly 1 1024 1024 bytes

Environment:

Fix: The cause is: DATAGRAMPACKET_BUFFER_MAX_CAPACITY in https://github.com/kohlschutter/junixsocket/blob/main/junixsocket-common/src/main/java/org/newsclub/net/unix/AFCore.java is in effect. This seems strange, since it's not a datagram at all. The OS has plenty of protection via sysctl net.core.wmem_max and sysctl net.core.rmem_max , and within this limits can be tuned by (example) this.channel.setOption(java.net.StandardSocketOptions.SO_SNDBUF, 8388608); and this.channel.setOption(java.net.StandardSocketOptions.SO_RCVBUF, 8388608);

I did a custom build which removed the artificial hardcoded limit. After that it works fine.

Please indicate if you prefer a pull request for your fine library. I'd suggest using a system variable, if set to 0 meaning unlimited. Do you have other ideas?

kohlschuetter commented 2 years ago

Hi @TW-Goldencode, thanks for reporting!

Please try the above commit. Let me know if you actually managed to get datagrams larger than 8 MB, or if that really is a good upper limit. Out of curiosity, can you explain for what you need these humungous datagrams, and how they perform compared to smaller ones?

Cheers, Christian

TW-Goldencode commented 2 years ago

@kohlschuetter Fantastic, thank you for this fix. "humungous" : I agree, messages of this size should ideally be split. It performs fine, but unlimited size can lead to unexpected memory issues. We detected this issue on an internal IPC where we can control both client and server. For that specific scenario, we can and will split it up. It still got us worried for situations outside of our control (when we IPC an external service like Postgresql or Redis). Your fix will address that. One thing might be improved : a warning, raise, and/or abend if the limit is reached. Now it just 'hangs'. But don't address this on our account, the fix is good as-is. I'll do a test next Friday and will get back to you. It would be great if you could create a release for this.

TW-Goldencode commented 2 years ago

@kohlschuetter Tested, it's all good. Would it be possible to create a 2.5.2 release for this?

Side notes: During the bootstrap, messages with a max of around 25MiB were transmitted. The size is not capped. The only safe way for us right now is -Dorg.newsclub.net.unix.thread-local-buffer.max-capacity=0 . That works good in the patch, as expected. I applied the patch on top of 2.5.1, and there were no side effects for the build.

Thanks again, we'll wait for release 2.5.2

kohlschuetter commented 2 years ago

Thanks for your feedback @TW-Goldencode.

I think it becomes clear that 8MB is not a realistic upper limit, since 25MB datagrams seem to work for you as well. I'm happy to see that the "max-capacity=0" approach works for you. However I think we can do better than that:

Please try the latest changes on that branch (including commit bf9fb50). That change lowers the limit back to 1MB, however it should work correctly (albeit perhaps a tad slower) for arbitrarily larger capacities.

Please let me know (after removing the max-capacity system property override from your VM config) how that new change performs compared to max-capacity=0.

Please (if possible) also test the scenario where you use direct byte buffers in the code that uses junixsocket (e.g., look for ByteBuffer.allocate and replace with ByteBuffer.allocateDirect).

TW-Goldencode commented 2 years ago

@kohlschuetter I agree the new approach is far superior, thank you. The issue already is a ByteBuffer.allocateDirect scenario, does that surprise you? I use a NIO channel. The isDirect flow (from memory, on mobile now) was entered. I didn't analyze the complete code path. I'll test on Monday and will get back to you.

TW-Goldencode commented 2 years ago

@kohlschuetter Note: It's possible allocateDirect was only on the ServerSocket, and that's where the stream is fed, so write buffer space was plenty. The JVM also didn't cap the capacity under the hood (there's a JVM system variable for that, but the default suffices). More on Monday.

TW-Goldencode commented 2 years ago

@kohlschuetter Tested https://github.com/kohlschutter/junixsocket/commit/bf9fb50eeeb2e48cdcafd8e64206384a5ce7a36b , it fixes the issue (with the 1MiB default limit, no system variables). Much preferred, performance is good.

Thanks again, we'll wait for release 2.5.2

kohlschuetter commented 2 years ago

junixsocket 2.5.2 has been released. Please re-open if you encounter further issues. Thanks again for reporting and testing, @TW-Goldencode!

TW-Goldencode commented 2 years ago

Thank you @kohlschuetter , no issues yet. Upped our gradle to 2.5.2 . Great stuff.