Closed mozhuli closed 3 years ago
similar to #1420 @caseydavenport
also remove calico-node(192.168.0.170), the bgp peer info in node(192.168.0.229)has removed in /etc/calico/confd/config/bird.cfg
, but not clean in node status
/ # cat /etc/calico/confd/config/bird.cfg
function apply_communities ()
{
}
# Generated by confd
include "bird_aggr.cfg";
include "bird_ipam.cfg";
router id 192.168.0.229;
# Configure synchronization between routing tables and kernel.
protocol kernel {
learn; # Learn all alien routes from the kernel
persist; # Don't remove routes on bird shutdown
scan time 2; # Scan kernel routing table every 2 seconds
import all;
export filter calico_kernel_programming; # Default is export none
graceful restart; # Turn on graceful restart to reduce potential flaps in
# routes when reloading BIRD configuration. With a full
# automatic mesh, there is no way to prevent BGP from
# flapping since multiple nodes update their BGP
# configuration at the same time, GR is not guaranteed to
# work correctly in this scenario.
}
# Watch interface up/down events.
protocol device {
debug all;
scan time 2; # Scan interfaces every 2 seconds
}
protocol direct {
debug all;
interface -"cali*", -"kube-ipvs*", "*"; # Exclude cali* and kube-ipvs* but
# include everything else. In
# IPVS-mode, kube-proxy creates a
# kube-ipvs0 interface. We exclude
# kube-ipvs0 because this interface
# gets an address for every in use
# cluster IP. We use static routes
# for when we legitimately want to
# export cluster IPs.
}
# Template for all BGP clients
template bgp bgp_template {
debug all;
description "Connection to BGP peer";
local as 64512;
multihop;
gateway recursive; # This should be the default, but just in case.
import all; # Import all routes, since we don't know what the upstream
# topology is and therefore have to trust the ToR/RR.
export filter calico_export_to_bgp_peers; # Only want to export routes for workloads.
source address 192.168.0.229; # The local address we use for the TCP connection
add paths on;
graceful restart; # See comment in kernel section about graceful restart.
connect delay time 2;
connect retry time 5;
error wait time 5,30;
}
# ------------- Node-to-node mesh -------------
# Node-to-node mesh disabled
# ------------- Global peers -------------
# No global peers configured.
# ------------- Node-specific peers -------------
# For peer /host/192.168.0.229/peer_v4/192.168.0.216
protocol bgp Node_192_168_0_216 from bgp_template {
neighbor 192.168.0.216 as 64512;
}
# For peer /host/192.168.0.229/peer_v4/192.168.0.229
# Skipping ourselves (192.168.0.229)
# For peer /host/192.168.0.229/peer_v4/192.168.0.233
protocol bgp Node_192_168_0_233 from bgp_template {
neighbor 192.168.0.233 as 64512;
}
/ #
/ # exit
# DATASTORE_TYPE=kubernetes KUBECONFIG=./kubeconfig.json ./calicoctl-linux-arm64 node status
Calico process is running.
IPv4 BGP status
+---------------+---------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+---------------+-------+----------+--------------------------------+
| 192.168.0.170 | node specific | start | 21:42:38 | Active Socket: Connection |
| | | | | refused |
| 192.168.0.216 | node specific | up | 21:28:27 | Established |
| 192.168.0.233 | node specific | up | 21:39:53 | Established |
+---------------+---------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
When i restart calico-node(192.168.0.229), the node status perform normal
# DATASTORE_TYPE=kubernetes KUBECONFIG=./kubeconfig.json ./calicoctl-linux-arm64 node status
Calico process is running.
IPv4 BGP status
+---------------+---------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+---------------+-------+----------+-------------+
| 192.168.0.216 | node specific | up | 21:51:23 | Established |
| 192.168.0.233 | node specific | up | 21:51:23 | Established |
+---------------+---------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
the bird confd
/ # cat /etc/calico/confd/conf.d/bird.toml
[template]
src = "bird.cfg.template"
dest = "/etc/calico/confd/config/bird.cfg"
prefix = "/calico/bgp/v1"
keys = [
"/host",
"/global",
]
check_cmd = "bird -p -c {{.src}}"
reload_cmd = "sv hup bird || true"
but I run the command sv hup bird || true
, failed
fail: bird: unable to change to service directory: file does not exist
I use command kill -1 $bird-pid
, the bird received SIGHUP signal success
https://github.com/projectcalico/node/pull/652 has fixed it
Expected Behavior
calico-node no need to restart to accept new BGP peer
Current Behavior
when add a new node, existed calico-node no need to restart to accept new BGP peer
Possible Solution
existed calico-node restart will success accept new BGP peer
Steps to Reproduce (for bugs)
we use router-reflector to build calico cluster in kubernetes
calico-node-bkr2q (192.168.0.152) is a new calico-node
and
./calicoctl-linux-arm64 get node
has found 192.168.0.152 nodebut
./calicoctl-linux-arm64 node status
not found peer with 192.168.0.152and has found 192.168.0.152 peer info in
/etc/calico/confd/config/bird.cfg
the existed calico-node logs show below:
I exec calico-node and restart bird, I found the new peer Established
the x86 new node
Context
Your Environment