Use kubernetes Node status' addresses ExternalIP field for autodetecting node-to-node mesh BGP peer ip

kskalski commented 2 years ago

When I set-up new node in a cloud environment it's common that it uses internal ip like 10.0.0.58 even though it has assigned and is accessible by public ip. Then trying to connect this node into a cluster outside of the same cloud environment create issue that such internal ip cannot be used, public ip should be used instead.

The public ip is not visible among network interfaces though, so I suppose it's hard to auto-discover it from inside the node. However kubernetes node's status contains the necessary information as ExternaIIP address field: https://kubernetes.io/docs/concepts/architecture/nodes/#addresses

I would like calico to use that information (if available) to assign peer IP in BGP node-to-node mesh instead (or alternatively when a different autodetection algorithm is specified) of trying to guess it from a source like list of network interfaces, which sometimes might be ambiguous or simply do not contain the right information at all.

Expected Behavior

Autodetection of IP address in calico/node queries kubernetes for current node's status and uses the information, e.g.

status:
  addresses:
    - type: ExternalIP
      address: 130.12.34.56

to assign projectcalico.org/IPv4Address and advertise it to BGP peers as ip to connect to

Other nodes shoudl then correctly establish connection using external ip set on a node, e.g.

./calicoctl node status
Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+---------------+-------------------+-------+----------+-------------+
| 130.12.34.56 | node-to-node mesh | up    | 03:46:41 | Established |

Current Behavior

calico/node reports in logs the autodetected ip from listing interfaces: [INFO][95] monitor-addresses/startup.go 713: Using autodetected IPv4 address on interface enp0s3: 10.0.0.58/24

This internal ip is set as value of projectcalico.org/IPv4Address annotation and BGP peers (being outside of that internal network) are unable to establish connection:

./calicoctl  node status
Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+------------+-------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |    INFO     |
+---------------+-------------------+-------+------------+-------------+
| 10.0.0.58     | node-to-node mesh | start | 14:15:41   | Connect     |

while status reported on the affected node is:

./calicoctl node status
Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+----------+---------+
| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |  INFO   |
+---------------+-------------------+-------+----------+---------+
| 51.12.34.56 | node-to-node mesh | start | 01:58:52 | Passive |

Possible Solution

use ExternalIP address discovered at runtime by quering kubernetes
this could be a new autodetection algorithm selected by new name for IP_AUTODETECTION_METHOD though it could also be used by default (when the ExternalIP is set, which might not always be the case)

Steps to Reproduce (for bugs)

create two nodes, where at least one of them uses private cloud and is assigned internal-ip, both nodes should have connectivity using their public ips
set-up kubernetes with calico and wireguard enabled on those nodes
observe calico detecting internal ip and advertising it to the other node as BGP peer ip
the other node cannot access internal ip (from different private cloud), thus can't connect to the peer

Context

the projectcalico.org/IPv4Address annotation on kubernetes can be manually modified and it fixes the connectivity of nodes in my case (calico/node detects the change and rebuilds / re-advertise the new address)
Right now I'm exploring a way to override the ip used by calico node by using IP env variable, but I need to find a way to extract it from runtime information available to use in config

Your Environment

Calico version - docker.io/calico/node:v3.21.1
Orchestrator version (e.g. kubernetes, mesos, rkt): k3s kubernetes version v1.22.3+k3s1
Operating System and version: linux, Ubuntu 20.04.3 LTS

caseydavenport commented 2 years ago

FYI this PR introduces something similar - a new autodetection method for KubernetesInternalIP - sounds like you want ExternalIP though?

kskalski commented 2 years ago

Yes, well, the point is for nodes to be able to reach each other, so depending on the setup you may be able and prefer internal ip, but in my case of connecting nodes in different private clouds the only choice is using external ip.

I suppose the connectivity of other nodes to current node's either internal or external ip might be out of scope of current node's detection, so probably an option would be necessary to prefer either one.

BTW, which PR exactly are you referring to?

caseydavenport commented 2 years ago

but in my case of connecting nodes in different private clouds the only choice is using external ip

Hm, this is a bit of an odd choice. I generally wouldn't recommend this - would be curious to hear what's your use case?

Running a single cluster across multiple public clouds introduces new failure modes for that cluster. Most folks I speak to run separate clusters on different clouds for redundancy, which is what I'd reocmmend.

BTW, which PR exactly are you referring to?

Whoops! This one: https://github.com/projectcalico/node/pull/1242

kskalski commented 2 years ago

Well, I know it is not the most popular or generally advised approach, but multi-cloud (mentioned in places like this https://cloud.netapp.com/blog/gcp-cvo-blg-multicloud-kubernetes-centralizing-multicloud-management?hs_amp=true) seems like a simpler way to build a distributed environment that avoids vendor lock-in, optimizes cost and spreads the failure domain across largely independent entities.

K8s, network virtualisation (as provided by Calico), distributed storage, etc. are generally the kind of abstractions that I prefer to work with instead of building new layers of replication, failover and resource scheduling that would be required in a multi-cluster approach. And I suspect they are still going to suffer from the kind of failure modes that I imagine can appear when you join resources connected over the public internet.

In my current set-up I do have control plane nodes inside a single cloud provider (tight latency requirements provide a strong argument here, but technically it still works in wider layout with a few ms larger distances), but I'm adding worker nodes from different providers and experimenting with adding some on-premise NATed devices (Raspberry PI) treating k8s and underlying network plane as the simplest way to cluster all my resources into single-access pool.

projectcalico / calico