Closed dev2production closed 3 weeks ago
@dev2production In order to segfault Aeron process with a slow subscriber the following conditions must be true:
tether=false
or it is a multicast/MDC channel with non-min flow control used. Subscription#poll
/Image#poll
call until an Image goes unavailable (i.e. UnavailableImageHandler#onUnavailableImage
is called) and stays blocked for at least aeron.client.resource.linger.duration
(3 seconds by default) after this event, i.e. the ClientConductor thread will free the underlying log buffer.fragmentLimit > 1
and it is not the last fragment to be read.If all of the above are true then an attempt to read the next frame will segfault the JVM as the read will be performed from the log buffer that was freed.
@vyazelenko Thank you very much for the detailed answer. A quick follow-up question; are you suggesting the segfault mentioned can still happen in the latest version (1.46.7) if all those conditions are true?
@dev2production What I described above will segfault JVM when using Java media driver in any Aeron version. This issue needs to be addressed in the application, i.e. ultimately application should not take forever during a FragmentHandler#onFragment
call. Tactically breaking any of the above conditions will also work (e.g. using fragmentLimit == 1
).
@dev2production I've pushed a change (https://github.com/real-logic/aeron/commit/ede0c033caaebdbbc5322a4900402b72cd0f8632) that would prevent a segfault if FragmentHandler
blocks for a long time.
@vyazelenko Fantastic. Really appreciate this change.
Hello, I noticed the latest release (1.46.7) has a bug fix to "Prevent an untethered subscription re-joining the stream at an old position.". Would this bug fix potentially solve a core dump that may occur with untethered subscriptions? I was running a few tests against 1.46.4 on various OSes with untethered subscriptions and the test application would sometimes crash with a core dump. Unfortunately, I cannot reproduce this scenario all the time.