Open womblep opened 1 year ago
Node removal is performed by rabbitmq_peer_discovery_common
.
DNS peer discovery is not a plugin, it is a core feature. Making it depend on a plugin therefore is not an option, and we see this automatic cleanup thing as dangerous.
I'd rather remove this feature from other plugins (it was originally introduced for AWS)
or move DNS peer discovery into a plugin for 3.13.0
than fold something we consider dangerous into the core.
@michaelklishin are you saying that rabbitmq_peer_discovery_common is only used by plugins and therefore using it for DNS peer discovery would link the plugin code to a core feature?
Then if so maybe moving DNS peer discovery to a plugin could be a good thing. I understand in many case the peer removal is dangerous due to transient loss of connection but I think DNS is one of those where it is reasonably safe. If the DNS record is considered the source of truth for the cluster then transient loss of connectivity doesn't change that.
However if the feature gets removed from all plugins, then a possible enhancement would be to add an API to remove nodes via the HTTP API interface.
A core feature cannot rely/depend on a plugin since such plugin would've to be always enabled.
This is an enhancement.
For DNS peer discovery, if cluster_formation.node_cleanup.only_log_warning = false then check the DNS record again (each interval) and remove peers that aren't in the record. If the DNS cant be looked up then don't do anything. This should be resistant to peer failure or network partitioning as the DNS record wouldn't change in those cases. Only when a peer is replaced would the record be updated. It looks like a lot of the code is common, it just probably needs to have the forgetting code put in the DNS peer discovery. I would try but I dont know Erlang at all.