projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.03k stars 1.35k forks source link

Outgoing packets does not follow configuration on Source IP #7855

Closed ElysaSrc closed 11 months ago

ElysaSrc commented 1 year ago

I am observing a bad choice of source IP of outgoing traffic on my Kubernetes nodes.

I have a node that have multiple IP adresses on a single interface:

2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:da:fa:da:fa:da brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    inet 1.2.3.4/32 scope global ens3
       valid_lft forever preferred_lft forever
    inet 1.2.3.5/32 scope global ens3
       valid_lft forever preferred_lft forever
    inet 1.2.3.6/32 scope global ens3
       valid_lft forever preferred_lft forever
    inet 1.2.3.7/32 metric 100 scope global dynamic ens3
       valid_lft 84172sec preferred_lft 84172sec

If I display the node configuration using calicoctl I see this:

spec:
  addresses:
  - address: 10.100.0.253/16
    type: CalicoNodeIP
  - address: 10.100.0.253
    type: InternalIP
  - address: 1.2.3.7
    type: ExternalIP
  bgp:
    ipv4Address: 10.100.0.253/16
  ipv4VXLANTunnelAddr: 10.42.9.192
  orchRefs:
  - nodeName: noprod-node-2
    orchestrator: k8s

Here is the default route (using ip route) of the given node:

default via 1.2.3.1 dev ens3 proto dhcp src 1.2.3.7 metric 100

Expected Behavior

I expect all traffic coming out of the node described to be seen as coming from 1.2.3.7 as it is:

Current Behavior

Executing a curl ifconfig.me on the node directly will display the right outgoing ip address (1.2.3.7 on the exemple given at the beginning). Executing a curl ifconfig.me on a container will always display the first IP of the interface porting all the addresses (so it will display 1.2.3.4 on this specific case).

Possible Solution

I have the feeling that it shows something at the first index of an array of available address on a given interface when selecting the outgoing interface. Maybe it should check for the default route of the host or the ExternalIP specifically ?

Steps to Reproduce (for bugs)

  1. Deploy a node with multiple public IP on a single interface
  2. Set an ExternalIP that is not the first of the list of the attached IP on this interface
  3. curl a service displaying the outgoing adress
  4. See that it does not match.

Context

This issue is affecting me because I need to be sure of the outgoing IP of my containers since I have to whitelist my IPs to other providers that will interact with my systems.

Your Environment

sridhartigera commented 1 year ago

@caseydavenport your thoughts on this please.

caseydavenport commented 1 year ago

Calico for the most part isn't involved this layer of host networking. The only times Calico is involved in source IP selection for traffic from a node is when encapsulation is enabled and that traffic is destined to another pod in the cluster, and we need to make sure the right IP is chosen such that the return traffic can be routed via the tunnel.

I expect all traffic coming out of the node described to be seen as coming from 1.2.3.7

I think this is a wrong assumption, taken literally to mean all traffic:

NAT doesn't seem to follow the same IP address selection rules that routing does. However, the NAT address can be configured per-node using FelixConfiguration's natOutgoingAddress field, although I confess it's not the most convenient config model.

ElysaSrc commented 1 year ago

Hello @caseydavenport, thanks for you inputs they have helped me understand (and fix) the behavior I was facing. Thanks a lot for this.

Is there any reason in Felix's code to use the first address of the interface instead of the one that matches the default route and/or the external ip ? Is this default behavior expected for some reason ?

cyclinder commented 1 year ago

@ElysaSrc Do you want the source IP address to be 1.2.3.7 when the pod accesses the external world?

ElysaSrc commented 1 year ago

@ElysaSrc Do you want the source IP address to be 1.2.3.7 when the pod accesses the external world?

This is indeed the behavior I would expect given the ExternalIP configured and the routing table of the host. The solution given earlier is working to override the default behavior.

cyclinder commented 1 year ago

As mentioned by @caseydavenport, when pods in a cluster communicate, the source address is the pod's IP address. When a pod accesses an external world, the source IP is rewritten as the node's IP, but CNI (calico) has no control over what this IP is, and it is implemented through iptables MASQUERADE.

I think you may need a project like egress gateway(https://github.com/spidernet-io/egressgateway), which can help you implement pods accessing external destinations from a specified gateway node, and the source IP is your expected EIP

ElysaSrc commented 1 year ago

I do understand that it is not Calico directly, @caseydavenport explained it to me earlier.

Applying a Felix specific configuration to override the natOutgoingAddress is working well as I mentioned earlier, using a configuration scoped to a given node for each of my node with this setup.

I needed to use the IP of the interface of my node, so your solution is not the correct one for the issue I had.

cyclinder commented 1 year ago

Okay, but note that If you use the egress gateway project, the EIP can also come from the node's IP.

mazdakn commented 1 year ago

@ElysaSrc closing the issue, but feel free to re-open if you feel there is more to add here.

ElysaSrc commented 1 year ago

I have uncovered a potential side-effect on setting natOutgoingAddress : it makes all traffic goes through the interface with the IP, even when I use internal adresses (on the private network used for node communication). It forces the traffic to go through the public interface, making internal communication between nodes not working.

I tried to add the iptablesNATOutgoingInterfaceFilter to scope it on the public interface only, but it stills does not work as intended and traffics seems to come from public interface when it reaches other nodes.

Is there any setting I can add to prevent such behavior to happen ?

Wizmll commented 1 year ago

@cyclinder Hi, I've been reading about EgressGateway since yesterday (after that I saw your comment on this issue), do you know if we can define multiples IP addresses (an IP address for each namespace) when configuring this EgressGateway ? thank you

lou-lan commented 1 year ago

@cyclinder Hi, I've been reading about EgressGateway since yesterday (after that I saw your comment on this issue), do you know if we can define multiples IP addresses (an IP address for each namespace) when configuring this EgressGateway ? thank you

EgressGateway can use multiple Egress IPs. You can use EgressPolicy and EgressClusterPolicy to control the Egress policy. EgressPolicy is a namespaced-level setting.

cyclinder commented 1 year ago

@MalikaML Maybe the docs is outdated, but you can try it by the following guide, Or Can you give a simple policy or gateway yamls @lou-lan?

lou-lan commented 1 year ago

@MalikaML Maybe the docs is outdated, but you can try it by the following guide, Or Can you give a simple policy or gateway yamls @lou-lan?

Sure, here is example:

Install

If you want to try it out, you can install the latest version.

wget https://github.com/spidernet-io/egressgateway/blob/github_pages/charts/egressgateway-0.3.0-rc1.tgz
helm install --values.yaml egressgateway-0.3.0-rc1.tgz

Create EgressGateway CR

kubectl get nodes
NAME           STATUS   ROLES           AGE    VERSION
workstation1   Ready    control-plane   116d   v1.27.3
workstation2   Ready    <none>          116d   v1.27.3
workstation3   Ready    <none>          116d   v1.27.3

Choose a node as the Egress Node, this node will implement the Egress IP.

kubectl label node workstation3  egress=true
apiVersion: egressgateway.spidernet.io/v1beta1
kind: EgressGateway
metadata:
  name: egw1
spec:
  clusterDefault: true
  ippools:
    ipv4:
    - 10.6.1.55
    - 10.6.1.56               # Egress IP
    ipv6:
    - fd00::55
    - fd00::56                # Egress IP
  nodeSelector:
    selector:
      matchLabels:
        egress: "true"      # This means that nodes with egress=true will serve as traffic egress nodes.

Create an EgressPolicy to match the Pods that require Egress

Create test pod

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mock-app
  namespace: default
  labels:
    app: mock-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mock-app
  template:
    metadata:
      labels:
        app: mock-app
    spec:
      nodeName: workstation2
      terminationGracePeriodSeconds: 1
      containers:
        - args:
            - "86400"
          command:
            - sleep
          image: registry.cn-shanghai.aliyuncs.com/loulan-public/tools:alpine
          imagePullPolicy: IfNotPresent
          name: sleep-container
          resources: {}

Create EgressPolicy

apiVersion: egressgateway.spidernet.io/v1beta1
kind: EgressPolicy
metadata:
  name: policy1
  namespace: default            # namesapce level
spec:
  appliedTo:
    podSelector:
      matchLabels:
        app: mock-app
  egressGatewayName: egw1
  egressIP:
    allocatorPolicy: default         # Auto assign an Egress IP, you can also specify it manually.

Test

~ kubectl exec -it mock-app-kgslh sh
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if32: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1430 qdisc noqueue state UP qlen 1000
    link/ether 4a:e2:c6:30:32:a1 brd ff:ff:ff:ff:ff:ff
    inet 10.21.52.94/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fd00:21::d0bb:cef5:64f2:295e/128 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::48e2:c6ff:fe30:32a1/64 scope link
       valid_lft forever preferred_lft forever
~ kubectl exec -it mock-app-kgslh sh
/ # curl 10.6.1.92:8080
Remote IP: 10.6.1.55:33082

You can see that your EgressIP has changed to the set IP. arch

Wizmll commented 1 year ago

@cyclinder thank you for your response ! @lou-lan thank you!! I appreciate your explanation of EgressGateway, and the tests were insightful. I've been considering its implementation in our Kubernetes cluster, and I have a specific use case that I'd like your insights on. Now, considering our cluster structure where each employee has a dedicated namespace with assigned IP addresses using MetalLB, and services within these namespaces are exposed with external IP addresses via MetalLB as well, I'm wondering if it's possible to use EgressGateway to track which specific IP address (belonging to a namespace/pod) was behind an outgoing request to the internet. Essentially, I aim to identify the source IP when an employee from their namespace decides to communicate outside the cluster. Is this something that EgressGateway can facilitate? I appreciate your insights and guidance on this matter.

Wizmll commented 1 year ago

@lou-lan Also, do I need to create it beforehand, and could you guide me on how to configure it? I've been searching online, but I haven't found a clear answer yet. Thank you once again

lou-lan commented 1 year ago

@lou-lan Also, do I need to create it beforehand, and could you guide me on how to configure it? I've been searching online, but I haven't found a clear answer yet. Thank you once again

@MalikaML

part 1 - think

EgressGateway can assign an Egress IP to a specific namespace/pod (or understood as, encapsulating the internet access traffic from a Pod, or all Pods under a namespace, as a specific Source IP).

Let's take a step-by-step look at whether EgressGateway can solve your actual scenario.

First, let's look at the principle of converting namespace/pod internet access traffic to an Egress IP: When EgressPolicy.spec.destSubnet is not filled in, the default behavior of EgressPolicy is to SNAT all traffic (except for accessing within the cluster) to the Egress IP. EgressGateway will automatically discover the CIDR within the cluster and can easily identify external access traffic. Of course, you can also manually specify that traffic accessing destSubnet is converted to Egress IP.

apiVersion: egressgateway.spidernet.io/v1beta1
kind: EgressPolicy
metadata:
  namespace: "default"
  name: "policy-test"
spec:
  ...
  destSubnet:
  - "1.1.1.1/32"

At this point, it seems that EgressGateway can solve the problem in your scenario. However, it's important to note whether you need EgressGateway to record access requests? EgressGateway currently does not record access requests. If you are managing on a network device (such as a Cisco Switch), it's easy to see this part of the traffic characteristics.

part 2 - put into effect

  1. Install EgressGateway
  2. Create EgressGateway CR
  3. Maybe you can use this feature: Namespace Default EgressGateway
  4. Create EgressPolicy to match Pods

part 3 - introspect

You should use EgressGateway when you have the above requirements. If you simply want to restrict internet traffic without identifying it, you should use k8s NetworkPolicy.

Wizmll commented 1 year ago

@lou-lan Thank you so much for your detailed explanation! I truly appreciate your help. I'll implement the solution and circle back to let you know how it's working :)