real-logic / aeron

Efficient reliable UDP unicast, UDP multicast, and IPC message transport
https://aeron.io
Apache License 2.0
7.42k stars 892 forks source link

Untethering Core Dump Fixed? #1675

Closed dev2production closed 3 weeks ago

dev2production commented 3 weeks ago

Hello, I noticed the latest release (1.46.7) has a bug fix to "Prevent an untethered subscription re-joining the stream at an old position.". Would this bug fix potentially solve a core dump that may occur with untethered subscriptions? I was running a few tests against 1.46.4 on various OSes with untethered subscriptions and the test application would sometimes crash with a core dump. Unfortunately, I cannot reproduce this scenario all the time.

vyazelenko commented 3 weeks ago

@dev2production In order to segfault Aeron process with a slow subscriber the following conditions must be true:

If all of the above are true then an attempt to read the next frame will segfault the JVM as the read will be performed from the log buffer that was freed.

dev2production commented 3 weeks ago

@vyazelenko Thank you very much for the detailed answer. A quick follow-up question; are you suggesting the segfault mentioned can still happen in the latest version (1.46.7) if all those conditions are true?

vyazelenko commented 3 weeks ago

@dev2production What I described above will segfault JVM when using Java media driver in any Aeron version. This issue needs to be addressed in the application, i.e. ultimately application should not take forever during a FragmentHandler#onFragment call. Tactically breaking any of the above conditions will also work (e.g. using fragmentLimit == 1).

vyazelenko commented 3 weeks ago

@dev2production I've pushed a change (https://github.com/real-logic/aeron/commit/ede0c033caaebdbbc5322a4900402b72cd0f8632) that would prevent a segfault if FragmentHandler blocks for a long time.

dev2production commented 3 weeks ago

@vyazelenko Fantastic. Really appreciate this change.