projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.98k stars 1.33k forks source link

Feature request to skip adding ip routes for monitoring interfaces #5252

Open SaranBalaji90 opened 4 years ago

SaranBalaji90 commented 4 years ago

I installed Calico on my kubernetes cluster and updated FELIX_INTERFACEPREFIX to eni along with other configurations. Would like to discuss about the following operations that calico-node performs,

  1. Adding routes for all the interfaces matching the prefix, to the main routing table (also, whenever there is a change in network policy involving these interfaces).
  2. If route is missing for some reason then calico fixes it by adding the routes again.

I was able to stop the reSync process(2) by setting FELIX_ROUTEREFRESHINTERVAL to 0. But can't find any way to turn off (1). Would like to hear others opinion on whether it will be useful to have this in our upcoming releases.

Context

We are creating vlan on the worker nodes which has its own route tables and has an entry for host veth using these vlan. This appraoch doesn't require host veth route to be available on main routing table. But using calico on these worker nodes adds routes for such host veth to the main routing table which is not required for our usecase.

So wondering given that there is a feature to turnoff resync for ip routes, can we also add feature to turnoff adding routes to main table?

caseydavenport commented 4 years ago

I think both of those are "working as expected" - Calico ensures that routes exist to interfaces that it thinks it is managing.

I'm not sure I really understand your use case - why do you want a vlan on the host rather than using standard routing?

SaranBalaji90 commented 4 years ago

@caseydavenport thanks for your response. Sorry for being late here. I somehow missed your reply. Just to give you some context, in AmazonEKS owned cni plugin we added a feature (https://github.com/aws/amazon-vpc-cni-k8s/pull/1125) where every pods will get it own ENI which can have different security groups (which allows inbound/outbound access to other resources) attached to them. In order to support this feature we had to create vlan device for each pods. These vlan devices has its own routing table and ip rules are configured in such a way that traffic flowing in and coming out of these vlan devices uses these routing table. Host Veth end of the pod is attached to this routing table. Therefore, if any other pods on the same/different nodes wants to talk to these pods backed by vlan devices, traffic goes out of the instance on their respective interfaces and comes back in through vlan device and reaches the host veth of the pod.

Due to this factor, we don't want to add these host veth end of pods (using vlan devices) to the main routing table. But if we install Calico then it will add the host veth to the main routing table (since the eni prefix is same) which will make the traffic to flow within the instance and doesn't utilize the aws security group permissions.

So I'm wondering if we can add a flag to stop adding routes to main routing table and leave the responsibility to users of calico to manage these routes. If we can achieve this then we can extend calico network policy support for the pods using vlan devices as well.

So my ask here is, given that there is a feature to turnoff resync for ip routes, can we also add feature to turnoff adding routes to main routing table?

sc-juho commented 4 years ago

@SaranBalaji90 would you be willing to provide a PR? What would be the quickest way to get Calico again supported with AWS security group for pods -feature.

hetpats commented 4 years ago

@SaranBalaji90 HI, Is there a way to block ec2 metadata using SG CNI route using current version ( not using ip table block on ec2) ? the biggest use-case we have is to block traffic from all-pods to metadata of instance except of kube-system k8s like cloud-watch and fluentd as this pods needs metadata access , but we want to block others.

SaranBalaji90 commented 3 years ago

@hetpats sorry for not noticing these comments earlier. We documented few approaches here https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html#restrict-ec2-credential-access. One way could be, use hostnetwork for cloud-watch and fluentd and set HttpPutResponseHopLimit to 1 so that other pods on the cluster will not have access to the metadata.

song-jiang commented 2 years ago

@SaranBalaji90 Sorry for the late reply. I'm looking into this issue and it seems for the newly created pod with security group policy, the routing is programmed in separate routing table only. Calico does not program routes because pods has vlan prefixed host-veth name. Do you know if it is a workaround added to VPC CNI to have different prefix for pod with SG policy?

Version of my aws VPC CNI

amazon-k8s-cni-init:v1.10.1-eksbuild.1
amazon-k8s-cni:v1.10.1-eksbuild.1

IP of the Pod with SG policy (192.168.36.173) IP of the pod without SG policy (192.168.63.103)

[root@ip-192-168-55-130 /]# ip rule
10: from all iif vlan.eth.1 lookup 101
10: from all iif vlan64e481f8118 lookup 101
20: from all lookup local
512:    from all to 192.168.32.58 lookup main
512:    from all to 192.168.63.103 lookup main
1024:   from all fwmark 0x80/0x80 lookup main
32766:  from all lookup main
32767:  from all lookup default

[root@ip-192-168-55-130 /]# ip route show table 101
default via 192.168.32.1 dev vlan.eth.1 
192.168.32.1 dev vlan.eth.1 scope link 
192.168.36.173 dev vlan64e481f8118 scope link 

[root@ip-192-168-55-130 /]# ip route
default via 192.168.32.1 dev eth0 
169.254.169.254 dev eth0 
192.168.32.0/19 dev eth0 proto kernel scope link src 192.168.55.130 
192.168.32.58 dev eni261a58870b4 scope link 
192.168.63.103 dev eni85a057d71d7 scope link 

cc @caseydavenport

srini-ram commented 2 years ago

@song-jiang - SGPP feature was introduced in AWS VPC CNI 1.7.x and the host veth name always had vlan prefix. In your testing, were you able to get SGPP + Calico network policies enforced on same pod/node ? . Customers have requested for Network policy to be available on SGPP ENI and one of the top priority issue to be resolved. Please let me know if there is anything we could do from our end to root cause this.

cc @SaranBalaji90

song-jiang commented 2 years ago

@srini-ram Thanks for reaching out!

were you able to get SGPP + Calico network policies enforced on same pod/node

Currently it is impossible because Calico policies apply on enixxxx interfaces only. Is it possible modify AWS VPC CNI to have eni prefix across the board? It would be difficult for Calico to support two interface prefixes.

I can start running some test if we have a single interface prefix. AFAIU, SGPP set up VLAN trunk interface and no Linux bridge is involved. The packet paths (in terms of iptable filters) should be supported by Calico. But I need confirm that by running some test.

srini-ram commented 2 years ago

@song-jiang - Thanks for your response. There are two issues here that needs discussion.

1/ First issue - AWS VPC CNI supports regular interfaces (that use enixxx prefix) and trunk interfaces (that use vlan prefix). The routes for vlan eth interfaces are programmed in separate routing table (e.g., table 101). With Calico installed on node, routes related to vlan eth interfaces on separate rt table (e.g., table 101) was getting copied over to the main routing table via reSync process which breaks data path for SGPP feature. #5252 was filed to track this as we couldn't find a work around. From your sample output listed above, this doesn't seem to be case any more. Can you please confirm this ?

2/ Second issue - On Trunk interfaces (which uses vlan eth prefix), calico network policy don't work as of today due to interface prefix convention used by AWS VPC CNI. I think you are suggesting to use enixxx prefix instead vlanxxx prefix so that Calico policy can be supported on trunk interfaces just like it is supported on regular interface. We could definitely look into that but can we achieve this without copying over non-default routing table routes (e.g., table 101) into main routing table (related to point 1/)

song-jiang commented 2 years ago

@srini-ram 1/ First issue

From your sample output listed above, this doesn't seem to be case any more. Can you please confirm this ?

Calico reinstates routes for interfaces managed by Calico (has the prefix eni) . Since vlan eth uses vlan prefix, its routes will not be programmed into main table by Calico. The result can be illustrated by the sample output above.

2/ Second issue I'm thinking a possible solution could be

  1. AWS VPC CNI uses eni prefix for any network interface which is directly connected to pod on host side.

    • Pods without SGPP. Interface name: enixxxxx , local routes are monitored by Calico and programmed into main table.
    • Pods with SGPP. Interface name:enixxxx, local routes are managed by AWS VPC CNI. Calico should be able to distinguish them with pods without SGPP by checking pod annotations. Pods with SGPP will have an annotation
      vpc.amazonaws.com/pod-eni: '[{"eniId":"eni-01d9ee176464f22d6","ifAddress":"0a:cd:43:36:11:bd","privateIp":"192.168.76.48","vlanId":1,"subnetCidr":"192.168.64.0/19"}]'
    • trunk network interface. Interace name:vlan.eth.xxx. The per node trunk network interface should avoid using eni prefix so that Calico will ignore it.
  2. Calico skips programming local routes for any interface owned by a pod with SGPP.

WDYT? cc @caseydavenport @SaranBalaji90

song-jiang commented 2 years ago

@srini-ram Maybe a better approach is for AWS VPC CNI to manage routes for all pods, regardless of their SGPP status. We can add a flag to Calico that disables pod routing entirely. It makes more sense that pod routing is managed by a single control plane.

caseydavenport commented 2 years ago

Linking related issue: https://github.com/projectcalico/calico/issues/5247

song-jiang commented 2 years ago

@srini-ram @SaranBalaji90 Are you happy with the solution

  1. Calico have an configuration option to stop writing to routing tables. Enable this option (which restarts all calico-node) when SGPP is enabled or this option is enabled by default with AWS VPC CNI.
    Please check https://github.com/projectcalico/calico/pull/5454. Let us know if that PR has what you expected.

  2. AWS VCP CNI uses single prefix eni for pod veth.

srini-ram commented 2 years ago

@song-jiang - Apologize for the delay in response.

  1. Disabling route sync (via flag) solves the original problem stated in this issue( #5252). With this change, Calico policy should work fine on regular veth interfaces that doesn't have SGPP attached along side other SGPP pods. Thanks for addressing this.

  2. Currently with AWS VPC CNI, there are certain nat rules that get applied based on 'eni' prefix. This shouldn't be applied to SGPP traffic and hence 'vlan' prefix was chosen. Reverting back to 'eni' prefix might require more careful thought (e.g., CNI downgrades). Will come back soon to this thread with more analysis. But this needs to be fixed to enable Calico policy for SGPP pods.

song-jiang commented 2 years ago

@srini-ram No worries. Thanks for your input. We will continue on point 1. Please LMK once you have more information on point 2.

song-jiang commented 2 years ago

@srini-ram Any update on point 2?

M00nF1sh commented 2 years ago

@song-jiang I have done some testing and able to make calico network policy works with SGPP pods with below changes.

  1. point 1 above.
  2. point 2 above.
  3. Disable the rpfilter rule added by calico (removing this line https://github.com/projectcalico/calico/blob/master/felix/rules/static.go#L1067): This is required since seems the netfilter's RPFilter sets the input iif to be loopback when doing the reverse route lookup, which is different than linux kernel's ones. And our SGPP pod's routing rule is based on input iif, so calico's rpfilter rule will drop packets. (i'm not familiar with kernel dev and not sure whether the designed behavior of "rpfilter" should use the input iif or not, but i feel this is a kernel bug/missing feature to have different rpfilter behavior)

For point 3 above, can calico adds a flag to disable the rule for aws? (also wondering why calico adds such additional rp_filter rule given the kernel already did rp_filter on these host end veth)