Closed TW-Goldencode closed 2 years ago
Hi @TW-Goldencode, thanks for reporting!
Please try the above commit. Let me know if you actually managed to get datagrams larger than 8 MB, or if that really is a good upper limit. Out of curiosity, can you explain for what you need these humungous datagrams, and how they perform compared to smaller ones?
Cheers, Christian
@kohlschuetter Fantastic, thank you for this fix. "humungous" : I agree, messages of this size should ideally be split. It performs fine, but unlimited size can lead to unexpected memory issues. We detected this issue on an internal IPC where we can control both client and server. For that specific scenario, we can and will split it up. It still got us worried for situations outside of our control (when we IPC an external service like Postgresql or Redis). Your fix will address that. One thing might be improved : a warning, raise, and/or abend if the limit is reached. Now it just 'hangs'. But don't address this on our account, the fix is good as-is. I'll do a test next Friday and will get back to you. It would be great if you could create a release for this.
@kohlschuetter Tested, it's all good. Would it be possible to create a 2.5.2
release for this?
Side notes: During the bootstrap, messages with a max of around 25MiB were transmitted. The size is not capped. The only safe way for us right now is -Dorg.newsclub.net.unix.thread-local-buffer.max-capacity=0
. That works good in the patch, as expected. I applied the patch on top of 2.5.1
, and there were no side effects for the build.
Thanks again, we'll wait for release 2.5.2
Thanks for your feedback @TW-Goldencode.
I think it becomes clear that 8MB is not a realistic upper limit, since 25MB datagrams seem to work for you as well. I'm happy to see that the "max-capacity=0" approach works for you. However I think we can do better than that:
Please try the latest changes on that branch (including commit bf9fb50). That change lowers the limit back to 1MB, however it should work correctly (albeit perhaps a tad slower) for arbitrarily larger capacities.
Please let me know (after removing the max-capacity system property override from your VM config) how that new change performs compared to max-capacity=0.
Please (if possible) also test the scenario where you use direct byte buffers in the code that uses junixsocket (e.g., look for ByteBuffer.allocate and replace with ByteBuffer.allocateDirect).
@kohlschuetter I agree the new approach is far superior, thank you. The issue already is a ByteBuffer.allocateDirect
scenario, does that surprise you? I use a NIO channel. The isDirect
flow (from memory, on mobile now) was entered. I didn't analyze the complete code path.
I'll test on Monday and will get back to you.
@kohlschuetter Note: It's possible allocateDirect
was only on the ServerSocket, and that's where the stream is fed, so write buffer space was plenty. The JVM also didn't cap the capacity under the hood (there's a JVM system variable for that, but the default suffices). More on Monday.
@kohlschuetter Tested https://github.com/kohlschutter/junixsocket/commit/bf9fb50eeeb2e48cdcafd8e64206384a5ce7a36b , it fixes the issue (with the 1MiB default limit, no system variables). Much preferred, performance is good.
Thanks again, we'll wait for release 2.5.2
junixsocket 2.5.2 has been released. Please re-open if you encounter further issues. Thanks again for reporting and testing, @TW-Goldencode!
Thank you @kohlschuetter , no issues yet. Upped our gradle to 2.5.2
. Great stuff.
Describe the bug: Sending a message larger than 1MiB from server to client results in the following infinite loop:
Environment:
Fix: The cause is:
DATAGRAMPACKET_BUFFER_MAX_CAPACITY
in https://github.com/kohlschutter/junixsocket/blob/main/junixsocket-common/src/main/java/org/newsclub/net/unix/AFCore.java is in effect. This seems strange, since it's not a datagram at all. The OS has plenty of protection viasysctl net.core.wmem_max
andsysctl net.core.rmem_max
, and within this limits can be tuned by (example)this.channel.setOption(java.net.StandardSocketOptions.SO_SNDBUF, 8388608);
andthis.channel.setOption(java.net.StandardSocketOptions.SO_RCVBUF, 8388608);
I did a custom build which removed the artificial hardcoded limit. After that it works fine.
Please indicate if you prefer a pull request for your fine library. I'd suggest using a system variable, if set to
0
meaning unlimited. Do you have other ideas?