kubernetes-sigs / cloud-provider-equinix-metal

Kubernetes Cloud Provider for Equinix Metal (formerly Packet Cloud Controller Manager)
https://deploy.equinix.com/labs/cloud-provider-equinix-metal
Apache License 2.0
74 stars 27 forks source link

Add annotations to nodes for BGP Peering "Config" type #43

Closed c0dyhi11 closed 3 years ago

c0dyhi11 commented 4 years ago

Packet has two types of switches in their fleet today. (With possibly more to come) These types today are Juniper and Arista. These switches require different BGP peering options. In order for MetalLB to know the peering options, we'll need to add annotations to the nodes to specify which options are needed for peering.

The API that is hit is: https://api.packet.net/devices/<Instance_Id>/bgp/neighbors The result looks like this:

{
  "bgp_neighbors": [
    {
      "address_family": 4,
      "customer_as": 65000,
      "customer_ip": "10.99.2.129",
      "md5_enabled": false,
      "md5_password": null,
      "multihop": false,
      "peer_as": 65530,
      "peer_ips": [
        "10.99.2.128"
      ],
      "routes_in": [
        {
          "route": "10.99.2.128/25",
          "exact": false
        }
      ],
      "routes_out": []
    }
  ],
  "device": {
    "href": "/devices/a9660769-d216-4bab-9068-8e5763e7f962"
  }
}
c0dyhi11 commented 4 years ago

@deitch @vielmetti ^^

vielmetti commented 4 years ago

The packngo dependency is noted at https://github.com/packethost/packngo/issues/176

vielmetti commented 4 years ago

Note the API docs at https://www.packet.com/developers/api/bgp/#bgp-getBgpNeighborData - which could use more detail in support of this implementation.

deitch commented 4 years ago

packngo has it merged in. The PR for getting annotations support into metallb is open at https://github.com/metallb/metallb/pull/593, and has the annotations supported listed here:

    MyASN    string `yaml:"my-asn"`
    ASN      string `yaml:"peer-asn"`
    Addr     string `yaml:"peer-address"`
    Port     string `yaml:"peer-port"`
    HoldTime string `yaml:"hold-time"`
    RouterID string `yaml:"router-id"

Which of those do we need to add annotations for that are new? We already add annotations for MyASN, ASN, Addr. Is it just the rest there?

deitch commented 4 years ago

Update: we depend on metallb for how its speaker does BGP session setup and announcements. They currently do not have support for enabling multihop. The BGP message code is here

enkelprifti98 commented 4 years ago

@pereztr5 mentioned that BGP sessions between MetalLB and the ToR router should work without the need to specify multihop.

One more thing to note about Arista switches is that they require static routes to be added in addition to the BGP session + loopback interface configuration. It should be done like this:

ip route add $peer-IP via $PRIVATE-IPV4-GATEWAY-address dev bond0
ip route add $peer-IP via $PRIVATE-IPV4-GATEWAY-address dev bond0

Peering IPs can be retrieve from the api.packet.net/devices/<instance_id>/bgp/neighbors API endpoint or metadata service though i did run into an instance today where the BGP neighbors object was completely missing from the metadata.

It was also noted that adding a static route for Juniper switches should not bring any issues. This is not required for Juniper switches but makes automation easier.

enkelprifti98 commented 4 years ago

We also discovered that Calico does not allow IPs in the range of link-local address space (169.254.0.0/16 (169.254.0.0 – 169.254. 255.255)) which the Arista peering IPs fall under so the BGP session does not work between Calico and the ToR as done in this repo:

https://github.com/packet-labs/kubernetes-bgp

deitch commented 4 years ago

We switched ccm to using the appropriate endpoint in #62

However, we are keeping this open, since it still doesn't work with new switches, e.g. in ny5. See this comment, will repaste below.

deitch commented 4 years ago

(reposted from #62 to continue the discussion here)

The advertised routes do not show up in the Packet dashboard, and nothing gets through (obviously).

The Packet API endpoint for getting packet info succeeded. Here is what I got for one of them (ignoring the irrelevant pointers to other fields):

    {
        AddressFamily: 4,
        CustomerAs: 65000,
        CustomerIP: "10.66.29.1",
        Md5Enabled: false,
        Md5Password: "",
        Multihop: true,
        PeerAs: 65530,
        PeerIps: []string len: 2, cap: 4, [
            "169.254.255.1",
            "169.254.255.2",
        ],
        RoutesIn: []github.com/packethost/packngo.BGPRoute len: 5, cap: 6, [
            (*"github.com/packethost/packngo.BGPRoute")(0xc000b14090),
            (*"github.com/packethost/packngo.BGPRoute")(0xc000b140a8),
            (*"github.com/packethost/packngo.BGPRoute")(0xc000b140c0),
            (*"github.com/packethost/packngo.BGPRoute")(0xc000b140d8),
            (*"github.com/packethost/packngo.BGPRoute")(0xc000b140f0),
        ],
        RoutesOut: []github.com/packethost/packngo.BGPRoute len: 0, cap: 0, [],
}

I set the announcer to use just the first address 169.254.255.1, which should be enough? The metallb config is pretty much the same on all nodes:

    peers:
    - my-asn: 65000
      peer-asn: 65530
      peer-address: 169.254.255.1
      peer-port: 0
      hold-time: ""
      router-id: ""

Here are the logs from the metallb speaker:

{"caller":"bgp_controller.go:232","event":"updatedAdvertisements","ip":"136.144.50.182","msg":"making advertisements using BGP","numAds":1,"pool":"default/my-nginx","protocol":"bgp","service":"default/my-nginx","ts":"2020-07-14T10:31:49.493215005Z"}
{"caller":"main.go:246","event":"serviceAnnounced","ip":"136.144.50.182","msg":"service has IP, announcing","pool":"default/my-nginx","protocol":"bgp","service":"default/my-nginx","ts":"2020-07-14T10:31:49.493362204Z"}
{"caller":"main.go:249","event":"endUpdate","msg":"end of service update","service":"default/my-nginx","ts":"2020-07-14T10:31:49.493510105Z"}
{"caller":"bgp.go:58","error":"dial \"169.254.255.1:179\": timeout","localASN":65000,"msg":"failed to connect to peer","op":"connect","peer":"169.254.255.1:179","peerASN":65530,"ts":"2020-07-14T10:32:03.287379901Z"}
{"caller":"bgp.go:58","error":"dial \"169.254.255.1:179\": timeout","localASN":65000,"msg":"failed to connect to peer","op":"connect","peer":"169.254.255.1:179","peerASN":65530,"ts":"2020-07-14T10:32:21.297377411Z"}
{"caller":"bgp.go:58","error":"dial \"169.254.255.1:179\": timeout","localASN":65000,"msg":"failed to connect to peer","op":"connect","peer":"169.254.255.1:179","peerASN":65530,"ts":"2020-07-14T10:32:47.307370212Z"}

@c0dyhi11 @enkelprifti98, what are we missing?

enkelprifti98 commented 4 years ago

Luca mentioned to me that using only one of the peer IPs means you're announcing to only one of the ToR so you don't get redundancy.

Regarding the error from the BGP speaker log, did you add a static route?

ip route add $peer-IP via $PRIVATE-IPV4-GATEWAY-address dev bond0

deitch commented 4 years ago

so you don't get redundancy.

Understood. More than happy to change it around to work with both (redundancy) once we can get it working with at least one. :-)

did you add a static route?

No, I did not. I didn't understand that is necessary. That would complicate things. That needs to be done on each host? Is that because it is using the 169.254.x.x address?

enkelprifti98 commented 4 years ago

Correct it has to be done on any host doing a BGP session. I don't fully know the reasoning behind but I would assume that's why.

When I was testing yesterday I was not learning/announcing any routes until I added the static route as suggested by Tony and the NetOps team.

deitch commented 4 years ago

Let me see if I get this. In original DCs (or those with our original switches), we needed to:

  1. Derive the gateway
  2. Use the gateway as the BGP peer

In the new ones, we:

  1. Derive the gateway
  2. Use the BGP API endpoint to get the peer
  3. Use the peer
  4. Add a route to the local routing table that sets the previously-derived gateway as the route to the peer

Is that right? Did it just get significantly more complicated?

enkelprifti98 commented 4 years ago

Yep that sounds correct to me. That's why I have been very against it :)

deitch commented 4 years ago

@enkelprifti98 we have no way to do this right now, and I don't see an easy way forward.

The ccm runs in a Deployment with a single instance running, as it should be, as it is a control loop (or, in dev mode, locally on your desktop connected to a cluster). The only thing running on each node is a DaemonSet of the metallb speakers, which:

That is it. It is a very lightweight process. It doesn't even update the IP into the localhost to accept them (kube-proxy does that, as it uses IPs that are on the Service).

Adding specific routes on the host just for the upstream BGP peers is so unique that it is unlikely to get accepted upstream into metallb's speaker; I doubt anyone else using metallb is doing it that way.

I see several ways to do this, all of which are complicated:

I don't really see a way forward.

Is there any reason we use that non-routable IP such that it requires changes on host, instead of a known, public or even private address?

jhead-slg commented 4 years ago

Though the setup is different than what you are doing with MetalLB here. When I setup a MinIO cluster with Bird2 in NY5 I did not have to add any static route to get BGP working… Below is some debug output.

root@minio-a1.core.ewr2.pkt.dev [~]
# lsb_release -d
Description:    Ubuntu 20.04 LTS

root@minio-a1.core.ewr2.pkt.dev [~]
# bird --version
BIRD version 2.0.7

root@minio-a1.core.ewr2.pkt.dev [~]
# ip route
default via 136.144.50.218 dev bond0 onlink
10.66.17.0/31 dev bond0 proto kernel scope link src 10.66.17.1
136.144.50.218/31 dev bond0 proto kernel scope link src 136.144.50.219
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

root@minio-a1.core.ewr2.pkt.dev [~]
# cat /etc/bird/bird.conf
log syslog all;

router id from "bond0" 10.0.0.0/8;

protocol device {
  scan time 10;
}

protocol direct {
  ipv4;
  interface "lo";
}

protocol kernel {
  merge paths;
  persist;
  scan time 20;
  ipv4 {
    import none;
    export none;
  };
}

protocol bgp neighbor_v4_1 {
  local as 65000;
  graceful restart;
  multihop;
  neighbor 169.254.255.1 as 65530;
  password "x";
  ipv4 {
    export filter {accept;};
    import filter {reject;};
  };
}

protocol bgp neighbor_v4_2 {
  local as 65000;
  graceful restart;
  multihop;
  neighbor 169.254.255.2 as 65530;
  password "x";
  ipv4 {
    export filter {accept;};
    import filter {reject;};
  };
}
enkelprifti98 commented 4 years ago

@deitch Justin's comment is very interesting as it might explain the theory of "it depends on the BGP speaker to properly use the default route"

When I was doing my testing in NY5, I was using bird instead of bird2 and ubuntu 18.04 if the OS has any effect on this behavior...

Justin is using Ubuntu 20.04 with bird2 and it's working for him without adding the static routes.

deitch commented 4 years ago

Yup @jhead-slg ; it is because you have this line:

router id from "bond0" 10.0.0.0/8;

That (I believe) explicitly sets the source route, so it is using the private IP.

Now that Tim explained how this works (not 2.5 hrs ago), as long as the announcer can be configured to:

then it will work correctly.

bird has both of those; metallb has one but not the other. We are working on getting the other in place. But we do see a road forward now.

deitch commented 4 years ago

See also this metallb issue

deitch commented 3 years ago

This issue can be closed.

There is nothing ccm can do about the actually using the src IP. That depends on metallb or kube-vip or something else consuming it.