real-logic / aeron

Efficient reliable UDP unicast, UDP multicast, and IPC message transport
https://aeron.io
Apache License 2.0
7.43k stars 892 forks source link

MDS destination transport is not removed from the transport list when Subscription is closed #1683

Closed alexey-romenskiy closed 4 days ago

alexey-romenskiy commented 2 weeks ago

You can use aeron-leak.zip to reproduce this behaviour.

  1. Run PublisherMain
  2. Run SubscriberMain
  3. You will see more and more "NAME_RESOLUTION_RESOLVE" messages printing on standard oputput indefinitely.
  4. Set workaround=true in SubscriberMain and re-run it.
  5. You will see only few "NAME_RESOLUTION_RESOLVE" messages on startup and no more such messages later.
  6. Set workaround=false and iterations=1000 and re-run.
  7. You will see lots of "NAME_RESOLUTION_RESOLVE" messages appending indefinitely.

What does that mean?

Aeron Driver does not release resources if closing the subscription with the dynamically added destination(s). The array io.aeron.driver.media.DataTransportPoller#channelAndTransports will grow indefinitely, thus continuing polling the leaked resources, consuming extra CPU and JVM heap, slowing down the entire processing of data streams. Periodical host name and IP address re-resolution for the dead connections is one of such redundant activities which we able to capture.

Why is it important?

Some usage scenarios imply opening and closing subscriptions in large quantities over the time, under heavy load and without restarting the application for months. Even a small leak may result in major service disruptions not tolerable for production purposes.

Reproduced with Aeron 1.46.7 and 1.34.0, OpenJDK 17.

vyazelenko commented 1 week ago

@alexey-romenskiy Thanks for the bug report. It looks like there is an issues with the destination transport cleanup for an MDS subscription when it being closed. That being said, your setup looks very complicated. Can you please explain why do you need to couple an MDC publication with and MDS subscription?

On another note: if you would use a normal subscription then there will be no issue. In Aeron multiple subscriptions on the same channel/stream will share a single receive endpoint (provided the same media driver is used), i.e. so multiple subscribers are allowed. Here is a change to the SubscriberMain to use a normal subscription:

public class SubscriberMain
{

    // --add-opens java.base/sun.nio.ch=ALL-UNNAMED for older Aeron versions
    public static void main(String[] args)
    {

        final var aeronDirectoryName = CommonContext.generateRandomDirName();
        final var localAddress = "127.0.0.1";
        final var remoteAddress = "127.0.0.2";
        final var remotePort = "41123";
        final var workaround = false;
        final var iterations = 10;

        ResolvLogAgent.attach();

        try (var driver = MediaDriver.launch(new MediaDriver.Context().aeronDirectoryName(aeronDirectoryName));
            var aeron = Aeron.connect(new Aeron.Context().aeronDirectoryName(aeronDirectoryName)))
        {

            for (var i = 0; i < iterations; i++)
            {
                try (var s = aeron.addSubscription("aeron:udp?endpoint=" + remoteAddress + ":" + remotePort, 1))
                {
                }
            }

            System.out.println("Waiting...");

            while (true)
            {
                // empty
            }
    }
}

P.S. We recommend updating your Aeron installations regularly as we only provide support within a first year from release, i.e. every 4 major releases.

vyazelenko commented 4 days ago

Fixed in https://github.com/real-logic/aeron/commit/2aa1ab12acb3c6a73414a23169d54684baa28dac.