Federation links that fail to connect with a timeout leak direct connection and channel processes

The Federation Plugin has a process leak which manifests on failed upstream connection attempts due to AMQP client timeouts on the connecting/downstream node.

On long running nodes, with high uptime, e.g. months or years, these have potential to eventually take down a node on reaching or exceeding the node's Erlang process limit.

Very easy to reproduce:

Setup a federation link across 2-nodes
Set an IPTABLE rule to block the downstream node from the upstream
Wait and observe the process count, downstream connections and channels continuously increase periodically every minute

What's occurring is, the federation link process on start-up does an AMQP client call to connect to the upstream and continuously times-out after 60s and throwing an exception which (currently goes un-caught). During this time, the link would've created a local downstream connection and channel, which it ultimately does not close, leading to a periodic rise in connections (1 per minute), channels (1 per minute) and Erlang Process Count (approx. 12 per minute). The lower the AMQP Client Timeout, the faster the Erlang Process Count can exhaust the node's process limit, which can ultimately lead to a complete node crash. Default AMQP client call timeout is 60s.

This problem was also reported about a year ago: https://groups.google.com/g/rabbitmq-users/c/VmMnp2pIBvE/m/KEGnfIA8AgAJ

Timeouts in the Erlang AMQP Client have been around for a while now, so this issue has been sitting in here for a couple of years (upgrades probably needed for federation plugin users).

Connection, channel and process leaks manifest as follows on tests:

1. Leaking Processes ( ~25k )

2. Leaking Connections (~2k )

3. Leaking Channels (~2k )

Types of Changes

What types of changes does your code introduce to this project? Put an x in the boxes that apply

[x] Bugfix (non-breaking change which fixes issue #NNNN)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Documentation (correction or otherwise)
[ ] Cosmetics (whitespace, appearance)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask on the mailing list. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

[x] I have read the CONTRIBUTING.md document
[x] I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
[ ] All tests pass locally with my changes
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] I have added necessary documentation (if appropriate)
[ ] Any dependent changes have been merged and published in related repositories

Further Comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc.

rabbitmq / rabbitmq-federation