Intercommunicate pods from 2 different kubernetes clusters

felipejfc commented 7 years ago

lets say I have 2 kubernetes clusters with the following configs:

cluster 1: node cidr: 172.21.0.0/16 non-masquerade-cidr: 100.68.0.0/14

cluster2: node cidr: 172.19.0.0/16 non-masquerade-cidr: 100.64.0.0/14

would it be possible (with calico) for pods in cluster 1 to communicate with pods in cluster 2? I wonder if a route reflector + bgp peers can help in this situation..

The two cluster are located in different aws VPC's connected by a VPN.

regards

ozdanborne commented 7 years ago

@felipejfc That's a pretty broad topic and there can be a lot of ways to tackle it. There are a few ways to grant network connectivity to calico networked pods across multiple clusters. However, I should warn you that NetworkPolicy definitions and pod labels are not shared across clusters. So you'll be very limited when using network policy in one cluster to permit or deny traffic from pods in another cluster.

felipejfc commented 7 years ago

@ozdanborne not having network policy should not be a problem in my use case, can you point me to a way for making it work?

thanks!

caseydavenport commented 7 years ago

@felipejfc that should essentially work - depends a bit on what your requirements are. Like @ozdanborne, there are a few ways to go about this.

You could, for example, run a single Calico network across the two clusters and things should "just work" - this has its own caveats w.r.t splitting the control plane across multiple geographic locations, if your VPCs aren't in the same region.

For two separate clusters, you could probably set up BGP peering across the two VPCs so that Calico instances in VPC1 know about pod IPs in VPC2 (haven't tried this myself though)

Like @ozdanborne, the tricky bit is getting policy to work in that scenario.

tmjd commented 7 years ago

@felipejfc You may want to check out some conversation in the CalicoUsers slack of someone else that was doing some thing similar. Here is a link when it was reported working https://calicousers.slack.com/archives/C0BC91S9F/p1499329726691940, scroll back a bit to see some of the stuff they did.

ozdanborne commented 7 years ago

@felipejfc Closing as we haven't heard from you in some time. Do reopen if you still need assistance.

njain1121 commented 6 years ago

I am trying to find a solution for a similar situation. Does anyone has more idea about communication between pods in two different k8s clusters?

I have an ODL application on two k8s clusters, The application has its pods in each cluster and they can talk to each other within same cluster (using weave net). However I also need pod communication with pods of other cluster(having same ODL app) as I want to span one ODL cluster across two k8s clusters. Is this possible? Has anyone worked on such scenario?

Any pointers are appreciated. Thanks.

kerk1v commented 6 years ago

Hello,

It seems the slack at https://calicousers.slack.com/archives/C0BC91S9F/p1499329726691940 is restricted to @tigera.io users only. Can anyone who has access help out and post the findings in there here?

I would like to use something similar to connect cassandra nodes on two different K8s clusters. Any ideas?

Thanks in advance.

tmjd commented 6 years ago

You can get an invite to calicousers slack by going to http://slack.projectcalico.org/

kerk1v commented 6 years ago

@tmjd thanks, worked but it's too far in the history to retrieve from a free plan. Can anyone else help with this?

karstensi commented 6 years ago

I am interested in this as well

tmjd commented 6 years ago

To the best of my knowledge the way the user in Slack reported doing it was something like Casey suggested in https://github.com/projectcalico/calico/issues/908#issuecomment-314912831 with BGP peering the two clusters and I think additionally it was necessary to add a disabled IP Pool to each cluster for the 'other' cluster's IP Pool.

Note: You must have different IP pool subnets because they must not conflict. Also this does not share any policy or labels between clusters meaning the traffic from the 'other' cluster will need to be handled policy wise like it is external incoming traffic.

karstensi commented 6 years ago

I ended up with this: 1) setup bird rr on external nodes to the clusters 2) create disabled CIDRs of the opposite cluster in each cluster 3) handle the deployments in both clusters via tooling 4) handle the networkpolicies in both clusters via tooling

gyliu513 commented 6 years ago

I have same question at https://github.com/projectcalico/calico/issues/1936

After I set up two Kubernetes clusters with Router Reflector, I found that the after the first start up and connect with RR, it can work fine; Then when the second cluster come up, the info in the RR etcd will be cleared, anyone has same issue with me? Thanks.

karstensi commented 6 years ago

I have setup a seperate bird route reflector and configured via external orchestration. I have not used the calico route reflector.

gyliu513 commented 6 years ago

@karstensi any detailed steps can share with me? Would love to follow your steps for some test, thanks!

gyliu513 commented 6 years ago

@karstensi Also what is the difference between bird route reflector and calico route reflector?

karstensi commented 6 years ago

sure. source to much of the below is here. https://docs.projectcalico.org/v3.1/usage/routereflector/bird-rr-config

First you need to have 1 or more pod networks for your cluster. eg we have two clusters c0 AS 64513 podCIDR 10.0.1.0/24 c1 AS 64514 podCIDR 10.0.2.0/24 both values can be set in your calic deployment yaml

Then you need to setup your route reflector, can be any kind. I used bird as I have used it before and because calico uses bird already.

snippet from bird.conf for all your nodes in your kubernetes clusters you need an entry in your brid.conf

log syslog { debug, trace, info, remote, warning, error, auth, fatal, bug };
log stderr all;

router id <ip-of-your-route-reflector>;

filter import_kernel {
  if ( net != 0.0.0.0/0 ) then {
  accept;
  }
reject;
}

debug protocols all;

protocol device {
  scan time 2;
}

protocol bgp <node-name> {
  description "<node-ip>";
  local as 64513; <- should be your AS set on your cluster
  neighbor <node-ip> as 64513;<- should be your AS set on your cluster
  multihop;
  rr client;
  graceful restart;
  import all;
  export all;
}

protocol bgp <node-name-in-cluster-c1> {
  description "<ip-of-node>";
  local as 64514; <- should be your AS set on your cluster
  neighbor <ip-of-node> as 64514; <- should be your AS set on your cluster
  multihop;
  rr client;
  graceful restart;
  import all;
  export all;
}

Then on to calico. On any node with calicoctl installed you can execute the steps below.

# Get the current bgpconfig settings
$ calicoctl get bgpconfig -o yaml > bgp.yaml

# Set nodeToNodeMeshEnabled to false
$ vim bgp.yaml

# Replace the current bgpconfig settings
$ calicoctl replace -f bgp.yaml

For each of your clusters do

$ cat << EOF | calicoctl create -f -
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: bgppeer-global-1
spec:
  peerIP: <route-reflector-ip>
  asNumber: 64513 <- your cluster as
EOF

The last step is to make the clusters aware of the other clusters network. in above eg we have c0 with 10.0.1.0/24 and c1 with 10.0.2.0/24 so on c0:

cat << EOF | calicoctl create -f -
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: c1-pool
spec:
  cidr: 10.0.2.0/24
  disabled: true <- if not disabled calico will start using this scope in the cluster and you will get into problems
EOF

and on c1

cat << EOF | calicoctl create -f -
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: c0-pool
spec:
  cidr: 10.0.1.0/24
  disabled: true <- if not disabled calico will start using this scope in the cluster and you will get into problems
EOF

hope that helps

gyliu513 commented 6 years ago

@karstensi thank, so how many RR do you have? Are you sharing one RR between two clusters or every cluster has their own RR?

karstensi commented 6 years ago

Above configuration has one, if you want more as a cluster you have to define rr cluster id in your brid conf

karstensi commented 6 years ago

the argumentation on how many RR's you need is more a discussion about your failure-domain you have/want to have.

gyliu513 commented 6 years ago

Cool, thanks @karstensi , let me have a try with calico router reflector, will append detailed test steps here later if I have any issues ;-)

gyliu513 commented 6 years ago

@karstensi I found that the kube-controller for calico always remove my calico nodes of another cluster.

I have two clusters:

Cluster1

9.111.255.152 gyliu-ubuntu-3
9.111.255.155 gyliu-ubuntu-2
9.111.255.77 gyliu-ubuntu-1

Cluster2

9.111.255.21 gyliu-icp-1.novalocal  gyliu-icp-1
9.111.255.46 gyliu-icp-2.novalocal  gyliu-icp-2
9.111.255.91 gyliu-icp-3.novalocal  gyliu-icp-3

After Cluster1 configured, successfully, RR works well.

root@gyliu-ubuntu-1:~/cases/calico# calicoctl get nodes
NAME
gyliu-ubuntu-1
gyliu-ubuntu-2
gyliu-ubuntu-3

After I add cluster2 in, the calico policy controller in Cluster2 will clean all calico nodes from Cluster1

root@gyliu-icp-1:~/cases/calico# calicoctl get nodes
NAME
gyliu-icp-1
gyliu-icp-2
gyliu-icp-3

The following is the log from calico policy controller in Cluster2, it is removing calico node of Cluster1.

MHostKey(host=gyliu-ubuntu-1) rev=""
2018-04-26 07:56:16.112 [INFO][1] etcdv3.go 289: Delete transaction failed due resource not existing etcdv3-etcdKey="/calico/resources/v3/projectcalico.org/bgpconfigurations/node.gyliu-ubuntu-1" model-etcdKey=BGPConfiguration(node.gyliu-ubuntu-1) rev=""
2018-04-26 07:56:16.123 [INFO][1] node_controller.go 241: Deleting node from Calico datastore. node="gyliu-ubuntu-2"
2018-04-26 07:56:16.135 [INFO][1] ipam.go 311: Releasing IP addresses: [10.1.38.4 10.1.38.3 10.1.38.5 10.1.38.6]
2018-04-26 07:56:16.136 [INFO][1] ipam_block.go 208: Deleting attributes: [2 1 3 4]
2018-04-26 07:56:16.147 [INFO][1] ipam.go 830: Decremented handle 'k8s-pod-network.6f7328d5b25662dfa28a7ed7f7b113633372d128da9db1fcce5e7007a5029224' by 1
2018-04-26 07:56:16.170 [INFO][1] ipam.go 830: Decremented handle 'k8s-pod-network.6118ecb56fb051ee7ed5b6afe6c11f78995e6d12fe2fc2863c2b6d1b656e0043' by 1
2018-04-26 07:56:16.221 [INFO][1] ipam.go 830: Decremented handle 'k8s-pod-network.ad16a1f916a886e07c5369a97e65efd126c043c4d2ca95c71b53089aa351569a' by 1
2018-04-26 07:56:16.244 [INFO][1] ipam.go 830: Decremented handle 'k8s-pod-network.d2d38e41df3fbd7942d47d920c833f1018aeec2d88aec20bec413fab536788f9' by 1
2018-04-26 07:56:16.388 [INFO][1] etcdv3.go 289: Delete transaction failed due resource not existing etcdv3-etcdKey="/calico/ipam/v2/host/gyliu-ubuntu-2" model-etcdKey=IPAMHostKey(host=gyliu-ubuntu-2) rev=""
2018-04-26 07:56:16.438 [INFO][1] etcdv3.go 289: Delete transaction failed due resource not existing etcdv3-etcdKey="/calico/resources/v3/projectcalico.org/bgpconfigurations/node.gyliu-ubuntu-2" model-etcdKey=BGPConfiguration(node.gyliu-ubuntu-2) rev=""
2018-04-26 07:56:16.444 [INFO][1] node_controller.go 241: Deleting node from Calico datastore. node="gyliu-ubuntu-3"
2018-04-26 07:56:16.449 [INFO][1] ipam.go 311: Releasing IP addresses: [10.1.33.195 10.1.33.196 10.1.33.197]
2018-04-26 07:56:16.450 [INFO][1] ipam_block.go 208: Deleting attributes: [1 2 3]
2018-04-26 07:56:16.467 [INFO][1] ipam.go 830: Decremented handle 'k8s-pod-network.2711dcf2d95859d3ff94eab50f8fdaab327ee5796358d2dd95bb5dfdebf040ef' by 1
2018-04-26 07:56:16.478 [INFO][1] ipam.go 830: Decremented handle 'k8s-pod-network.3a34112dd68189f48c0f9fa4fecd8b2a325c433dea609d9ac2d43cd95f0fd6a2'

@karstensi did you have any special configuration for your calico policy controller?

This is my deploy topology, two kubernetes clusters sharing one etcd and RR.

gyliu513 commented 6 years ago

@karstensi seems the two calico policy controller has some conflict?

karstensi commented 6 years ago

Ok, but that is not the architecture that i described. Mine looks like this.

When you are using the calico route reflectors they keep state in etcd, I have not used that as I do not know what it does, also it seems from documentation that the calico route refelctor is lacking the calicoctl integration.

gyliu513 commented 6 years ago

@karstensi did you have the calico kube controller? I think you should have based on documents here https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/integration

From my part, the calico kube controller is making trouble for me, as each cluster will have a kube controller and each try to clean up cache in etcd, and this caused the etcd data keeps changing.

From below, you can see I have one calico kube controller and three calico nodes.

root@gyliu-ubuntu-1:~# kubectl get pods -n kube-system | grep calico
calico-kube-controllers-7cc9dfb5c7-lbckq                  1/1       Running   0          1h
calico-node-24n56                                         2/2       Running   0          1h
calico-node-6j9q6                                         2/2       Running   0          1h
calico-node-7z6z6                                         2/2       Running   0          1h

I think we have same topology

gyliu513 commented 6 years ago

@karstensi how many BGPConfiguration do you have? As the BGPConfiguration also include asNumber, and each cluster should have a different asNumber, so I may need two BGPConfiguration, but I found that I cannot create another BGPConfiguration if I already have a global one.

On Cluster1

root@gyliu-ubuntu-1:~/cases/calico# calicoctl get bgpconfig -oyaml
apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
  kind: BGPConfiguration
  metadata:
    creationTimestamp: 2018-04-27T06:36:43Z
    name: default
    resourceVersion: "16195"
    uid: 59797f00-49e5-11e8-8580-fa163e0ee8bb
  spec:
    asNumber: 4567
    nodeToNodeMeshEnabled: false
kind: BGPConfigurationList
metadata:
  resourceVersion: "16378"

On Cluster2, I want to set another BGPConfiguration for Cluster2 with different asNumber.

[root@gyliu-icp-1 calico]# cat bgpconfig.yaml
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default-istio
spec:
  nodeToNodeMeshEnabled: false
  asNumber: 4569

Create failed for BGPConfiguration.

[root@gyliu-icp-1 calico]# calicoctl create -f ./bgpconfig.yaml
Failed to create 'BGPConfiguration' resource: error with the following fields:
-  BGPConfiguration.Spec.NodeToNodeMeshEnabled (Cannot set nodeToNodeMeshEnabled on a non default BGP Configuration.)
-  BGPConfiguration.Spec.ASNumber (Cannot set ASNumber on a non default BGP Configuration.)

Any comments for this?

tmjd commented 6 years ago

@gyliu513 I think a big problem with the setup you are suggesting is that you have one etcd and one route reflector (I'm assuming you are using the calico route-reflector image) that you are trying to share between 2 clusters. I don't think a setup like that is possible. I don't think it is reasonable to share an etcd instance/cluster between 2 separate kubernetes clusters.

To do the cross cluster setup I think you would need:

Two independent clusters
- no sharing etcd
- they will need different 'active' IP Pool subnets for cross cluster communication
- Each cluster will need the other cluster's IP Pool but disabled (this causes bird to distribute the routes it receives from the other cluster)
Separate manually configured route reflectors (like using Bird manually configured).
Setup peering between the two route reflectors

gyliu513 commented 6 years ago

Thanks @tmjd , really very helpful!

More questions:

1) I was using calico/node container, so can I still manually configure RR? I found the document here https://docs.projectcalico.org/v3.1/usage/routereflector/bird-rr-config claiming that For a container-based deployment, using the calico/node container, check out the Calico BIRD route reflector container.

2) Setup peering between the two route reflectors, I did not get any document telling me how to set up peering between two RR, and the only document for set up peer is https://docs.projectcalico.org/v3.1/usage/configuration/bgp , but it does not mention peering between two RRs.

gyliu513 commented 6 years ago

@tmjd I think that we are not able to use Calico Router Reflector V3.0 configure this due to the following two issues:

Will try calico 2.6 with below topology

karstensi commented 6 years ago

As described above in my topology I am not using the calico rr docker image but run a bird instance on a VM.

I have 1 BGPConfiguration per cluster.

tmjd commented 6 years ago

@gyliu513 Yes due to the 2 issues you linked it is not possible to use the containerized calico/routereflector. I am suggesting to set up your own route reflector with bird. I understand that the docs suggest checking out the containerized version but it is not required to use the containerized version.

I did not get any document telling me how to set up peering between two RR

Yes there is not currently any documentation in the calico docs for doing that. It is not Calico specific configuration and is only required in more advanced setups so there has not been any documentation created for that.

gyliu513 commented 6 years ago

Thanks @tmjd

Now I want to set up such a cluster with calico 2.6.6 and RR 0.4.2 which support RR mesh, but found that the RR 0.4.2 does not work at https://github.com/projectcalico/calico/issues/1948 , so

Can you help check this issue to tell me what is wrong wit my configuration?
I think that I can use the topology as https://github.com/projectcalico/calico/issues/908#issuecomment-385160303 with Calico 2.6.6 and RR 0.4.2 to set up pods can communicate with each other from different clusters, right?

tmjd commented 6 years ago

I do not think you can do what you are trying to do with the containerized router reflector. You are trying to do something that is a known limitation of that containerized RR. Basically in the setup you are trying the RR1 would be an 'edge' router for RR2 and RR1 would be an 'edge' router for RR2. And as the linked document says there is no way to setup that type of route, the basics problem is that there is no way to setup an external peering when using the Calico containerized RR.

I think there are maybe 2 options you have here.

Setup and configure bird manually. Calico does not provide a RR that can do the peering necessary for 2 clusters to have BGP peering.
I think that you could add each node in cluster 1 as a Global BGP Peer in cluster 2 and each node in cluster 2 as a Global BGP Peer in cluster 1. This would effectively create a node-to-node mesh over all the clusters. I will stress this again that I think this may work but have not tried and there could be something obvious that I am missing. I would not suggest a setup like this for production but if you are just trying to do a POC it may do what you need.

gyliu513 commented 6 years ago

Thanks @tmjd

I followed the steps at https://docs.projectcalico.org/v3.1/usage/routereflector/bird-rr-config , and it works fine for pods in two clusters can communicate with each other.

gyliu513 commented 6 years ago

/cc @sdake

gyliu513 commented 6 years ago

A more simple way is use Node-Node-Mesh and add ip route rules from cluster2 to cluster1 and cluster1 to cluster2, this can also make sure the pod can communicate cross clusters.

Commands like this:

ip route add 30.1.44.192/26 via 9.111.255.92
ip route add 30.1.255.192/26 via 9.111.255.203

FYI @sdake @karstensi

tmjd commented 6 years ago

I think since there is a workable solution here (actually 2 now I think) I'm going to close this issue. If anyone objects please speak up and I will reopen it.

sara4dev commented 6 years ago

does this work if the cluster nodes are in two different subnet?

gyliu513 commented 6 years ago

@saravanakumar-periyasamy yes, as long as the two different subnet hosts can access each other via ip address.

sara4dev commented 6 years ago

@gyliu513 - do you have to enable any cross-subnet specific settings in bird/calico-node? for some reason we are not able to go get it working with the above pattern - https://github.com/projectcalico/calico/issues/908#issuecomment-386496313

gyliu513 commented 6 years ago

My VMs are not cross-subnet, can you please first test if VMs in one sub-net works before you test the cross sub-net? This can help us to identify the issue that you are having @saravanakumar-periyasamy

sara4dev commented 6 years ago

yes, thats what we are doing now. going to test it in one subnet. @gyliu513.. will update our results here.

sara4dev commented 6 years ago

@gyliu513 - do you have to disable IPIP in calico?

gyliu513 commented 6 years ago

@saravanakumar-periyasamy I enabled IPIP due to I was deploying my env in OpenStack, but I think this should not impact much.

manishrajkarnikar commented 6 years ago

@gyliu513 how do you get around this iptable rule that felix inserts -A cali-INPUT -p ipv4 -m comment --comment "cali:8U47CsIYs8dEG5nH" -m comment --comment "Drop IPIP packets from non-Calico hosts" -j DROP

looks like calico drops packets from unknown hosts for ipip mode

basically for me and @saravanakumar-periyasamy we have 2 cluster setup with routing and peering all working. We see that packet reaches its destination from one cluster to another, only to be dropped by this iptable rule

gyliu513 commented 6 years ago

My rule is as this

-A cali-INPUT -p ipencap -m comment --comment "cali:JhfQUFFJ2v0jbipF" -m comment --comment "Drop IPIP packets from non-Calico hosts" -j DROP

manishrajkarnikar commented 6 years ago

@gyliu513 we were able to finally make 2 cluster talk to each other. We had to make couple of changes:

set FELIX_CHAININSERTMODE to append in felix config in both clusters
insert a rule which would override the above mentioned iptable rule in both clusters

Apparently when using ipip, calico drops all the packets which are not in calicoctl get nodes list. Above iptable rule checks against an ipset populated via this list. There is no mechanism to add to this node list.

Not sure how it worked for you. Maybe this rule was not there in earlier version of calico.

gyliu513 commented 6 years ago

@manishrajkarnikar I was actually using calico v3.0.4 which works fine with ipip enabled.

KevDBG commented 6 years ago

Hello @gyliu513 , i'am really interesting about your integration. I would like to do the same thing, connect pods between K8S clusters (AWS between VPC).

On which instance you setup the Bird router reflector ? If it's great if you could provide some technical implementation details ^^. Thanks !

gyliu513 commented 6 years ago

@Zophren I was setting up bird router reflector on a separate node which was not in my k8s cluster, just follow the document here https://docs.projectcalico.org/v3.1/usage/routereflector/bird-rr-config

Besides, if you are using calico node node mesh, you can follow steps here https://medium.com/ibm-cloud/multi-cluster-support-for-service-mesh-with-ibm-cloud-private-d7d791f9b778 to configure, check the section Config Pod Communication Cross IBM Cloud Private Clusters.

projectcalico / calico

Intercommunicate pods from 2 different kubernetes clusters #908