rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.16k stars 3.91k forks source link

DNS Peer Discovery node cleanup #7520

Open womblep opened 1 year ago

womblep commented 1 year ago

This is an enhancement.

For DNS peer discovery, if cluster_formation.node_cleanup.only_log_warning = false then check the DNS record again (each interval) and remove peers that aren't in the record. If the DNS cant be looked up then don't do anything. This should be resistant to peer failure or network partitioning as the DNS record wouldn't change in those cases. Only when a peer is replaced would the record be updated. It looks like a lot of the code is common, it just probably needs to have the forgetting code put in the DNS peer discovery. I would try but I dont know Erlang at all.

michaelklishin commented 1 year ago

Node removal is performed by rabbitmq_peer_discovery_common.

michaelklishin commented 1 year ago

DNS peer discovery is not a plugin, it is a core feature. Making it depend on a plugin therefore is not an option, and we see this automatic cleanup thing as dangerous.

I'd rather remove this feature from other plugins (it was originally introduced for AWS) or move DNS peer discovery into a plugin for 3.13.0 than fold something we consider dangerous into the core.

womblep commented 1 year ago

@michaelklishin are you saying that rabbitmq_peer_discovery_common is only used by plugins and therefore using it for DNS peer discovery would link the plugin code to a core feature?

Then if so maybe moving DNS peer discovery to a plugin could be a good thing. I understand in many case the peer removal is dangerous due to transient loss of connection but I think DNS is one of those where it is reasonably safe. If the DNS record is considered the source of truth for the cluster then transient loss of connectivity doesn't change that.

However if the feature gets removed from all plugins, then a possible enhancement would be to add an API to remove nodes via the HTTP API interface.

michaelklishin commented 1 year ago

A core feature cannot rely/depend on a plugin since such plugin would've to be always enabled.