Closed alexey-romenskiy closed 4 days ago
@alexey-romenskiy Thanks for the bug report. It looks like there is an issues with the destination transport cleanup for an MDS subscription when it being closed. That being said, your setup looks very complicated. Can you please explain why do you need to couple an MDC publication with and MDS subscription?
On another note: if you would use a normal subscription then there will be no issue. In Aeron multiple subscriptions on the same channel/stream will share a single receive endpoint (provided the same media driver is used), i.e. so multiple subscribers are allowed.
Here is a change to the SubscriberMain
to use a normal subscription:
public class SubscriberMain
{
// --add-opens java.base/sun.nio.ch=ALL-UNNAMED for older Aeron versions
public static void main(String[] args)
{
final var aeronDirectoryName = CommonContext.generateRandomDirName();
final var localAddress = "127.0.0.1";
final var remoteAddress = "127.0.0.2";
final var remotePort = "41123";
final var workaround = false;
final var iterations = 10;
ResolvLogAgent.attach();
try (var driver = MediaDriver.launch(new MediaDriver.Context().aeronDirectoryName(aeronDirectoryName));
var aeron = Aeron.connect(new Aeron.Context().aeronDirectoryName(aeronDirectoryName)))
{
for (var i = 0; i < iterations; i++)
{
try (var s = aeron.addSubscription("aeron:udp?endpoint=" + remoteAddress + ":" + remotePort, 1))
{
}
}
System.out.println("Waiting...");
while (true)
{
// empty
}
}
}
P.S. We recommend updating your Aeron installations regularly as we only provide support within a first year from release, i.e. every 4 major releases.
You can use aeron-leak.zip to reproduce this behaviour.
What does that mean?
Aeron Driver does not release resources if closing the subscription with the dynamically added destination(s). The array io.aeron.driver.media.DataTransportPoller#channelAndTransports will grow indefinitely, thus continuing polling the leaked resources, consuming extra CPU and JVM heap, slowing down the entire processing of data streams. Periodical host name and IP address re-resolution for the dead connections is one of such redundant activities which we able to capture.
Why is it important?
Some usage scenarios imply opening and closing subscriptions in large quantities over the time, under heavy load and without restarting the application for months. Even a small leak may result in major service disruptions not tolerable for production purposes.
Reproduced with Aeron 1.46.7 and 1.34.0, OpenJDK 17.