projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.02k stars 1.34k forks source link

FELIX_NATOUTGOINGADDRESS and IPv6 #3275

Open duylong opened 4 years ago

duylong commented 4 years ago

Hi,

I have a problem when I'm trying to configure Calico in order to use FELIX_NATOUTGOINGADDRESS.

I'm in a dualstack IPv4/IPv6 with Kubernetes. When I use this configuration in my calico.yaml deployment:

            - name: FELIX_NATOUTGOINGADDRESS
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP

... I get this error:

2020-02-24 14:12:52.093 [WARNING][1558] table.go 1191: Failed to execute ip(6)tables-restore command error=exit status 2 errorOutput="ip6tables-restore v1.8.2 (legacy): Bad IP address \"192.168.10.100\"\n\nError occurred at line: 2\nTry `ip6tables-restore -h' or 'ip6tables-restore --help' for more information.\n" input="*nat\n-R cali-nat-outgoing 1 -m comment --comment \"cali:ltvx3bX8efbq2000\" -m set --match-set cali60masq-ipam-pools src -m set ! --match-set cali60all-ipam-pools dst --out-interface bond+ --jump SNAT --to-source 192.168.10.100\nCOMMIT\n" ipVersion=0x6 output="" table="nat"

How to configure for FELIX_NATOUTGOINGADDRESS in IPv6 mode ?

spikecurtis commented 4 years ago

@duylong there is an issue with this config option in dual stack mode. We don't have a way to specify both an IPv4 address and an IPv6 address. What's happening is that status.hostIP is being listed as an IPv4 address, which ip6tables rightly rejects.

As a workaround, you could remove this config option entirely and Calico should default to using the address of the interface the packet is forwarded out of.

duylong commented 4 years ago

I have several interfaces on my server in different VLANs. Without filtering and with the MASQUERADE mode by default, we give containers full access to the different routes, I wanted to avoid that. The idea of a single IP output does not suit me very much. According to you, is it possible to have a MASQUERADE according to the IP of the service?

spikecurtis commented 4 years ago

If I understand correctly, you are attempting to use the choice of source IP in NAT as a way to limit which networks pods can send traffic to. A more robust solution would be to define a GlobalNetworkSet that represents the networks pods are allowed to access and then use Calico policy on pod egress to limit them to just those networks.

It's not possible to choose different source IPs during MASQUERADE for different pods at this time.

duylong commented 4 years ago

I wanted to avoid defining network rules on egress, I will have to do it I think.

Maybe we could add the source to the default pod route. We turn off MASQUERADE and we should have source IP, right? If so, I don't know who we could ask for this feature.

spikecurtis commented 4 years ago

We turn off MASQUERADE and we should have source IP, right?

When you say "source IP" do you mean the pod's source IP or the node's source IP? You can tell Calico not to MASQUERADE by setting natOutgoing to false on the IPPool. This will cause the packets to have the pod's IP address as their source---obviously in order for this to work, the networks you are sending traffic to will need to know how to route packets back, e.g. by peering over BGP.

duylong commented 4 years ago

"source IP" = "source in the default route". For example:

default via fe80::ecee:eeff:feee:eeee dev eth0 src SERVICE__IP

I don't know if it works. I would look at BGP if it can solve my problem, but I would like to keep it simple for now (without additional layer).

spikecurtis commented 4 years ago

Ah, ok. Turning off MASQUERADE will not help. If we aren't doing any NAT, then the source IP in the default route only applies to packets from the host itself (not pods, unless they are host-networked).

duylong commented 4 years ago

For my problem, I finally made a double attachment with Multus, that solves my problem with the source IP. Now I have 2 default routes, created by Calico and my network, it is not yet optimal.

sriramy commented 2 years ago

Is this still a feature that is worth adding to calico?

Today we rely on datacenter's edge router to perform SNAT for egress traffic towards external networks, but my stakeholders are interested in being able to have a consistent egress IP for outbound connections from the pod without the need for an edge router. See: https://github.com/projectcalico/calico/issues/5549

I did some prototyping on this(https://github.com/projectcalico/calico/compare/master...sriramy:calico:snat), and it doesn't seem difficult to implement for linux dataplane, I guess there will be more work needed for eBPF. I am still a newbie to this code base, so please advise if this is a good idea and if I can create a PR.

tomastigera commented 2 years ago

I do not think there would be much needed on the ebpf side. MASQ is done in iptables anyway. The best to figure it out is to have fv test coverage for the feature. We ran it both in iptable an ebpf mode.

sriramy commented 2 years ago

Took me a while to write FV tests and get them to work. But I now have something working. ~But I still have some problems with one of the tests where ingress traffic from an external server gets dropped by eBPF. The logs seem to indicate that the flow is not whitelisted, but I am unable to understand where the problem is. The same test with ETCD datastore works fine. https://pastebin.com/Qk91D5fA~ eBPF tests are not successful, since the traffic back from external server to the workload on felix is dropped at eth0 ingress. Here are the logs: https://pastebin.com/aqzqYDRr

My test setup is as follows, and the traffic starts at the workload inside felix (10.65.0.10), gets served in ext-server and comes back to eth0 on felix where it gets redirected back out on eth0.

        ext-server                                              felix
┌────────────────────────┐                        ┌────────────────────────────────┐
│                        │                        │                                │
│ ┌────────────────────┐ │                        │ ┌────────────────────────────┐ │
│ │ tcp:8055           │ │                        │ │                            │ │
│ │                    │ │                        │ │                            │ │
│ │ 10.66.0.20         │ │                        │ │                 10.65.0.10 │ │
│ └┬────┬──────────────┘ │                        │ └────────────────────┬──────┬┘ │
│  │veth│                │                        │                      │ veth │  │
│  └┬──▲┘                │                        │                      ├──────┤  │
│   │  │                 │                        │                      │tc-bpf│  │
│   │  │                 │                        │                      └┬─────┘  │
│   │  │                 │                        │                       │        │
│   │  │                 │                        │                       │        │
│   │  │                 │                        │                       │        │
│   │  │              ┌──┼──┐                  ┌──┼──┬──────┐    ┌────────┼─────┐  │
│   │  └──────────────┤  │  ◄──────────────────┤  │  │      ◄────┼────────┘     │  │
│   │                 │  │  │                  │  │  │      │    │ SNAT to      │  │
│   └─────────────────►  │  ├──────────────────┼──┼──┼──┐   │    │ 10.65.0.110  │  │
│                     │  │  │                  │  │  │  │   │    └──────────────┘  │
│                     │  │  │           xxx────┼──┼──┼──┘   │                      │
│                     │  │  │                  │  │  │      │                      │
│                     │ eth0│                  │eth0 │tc-bpf│                      │
│                     └──┬──┘                  └──┬──┴──────┘                      │
│                        │                        │                                │
└────────────────────────┘                        └────────────────────────────────┘
sriramy commented 2 years ago

Anyhow, I need help in understanding what is the best way to design a solution for the original problem.

sriramy commented 2 years ago

Ok, the problem was that the FV tests didn't assign the egress SNAT IP on the node. Once, I configured it on eth0 on felix, it does work (my bad for not realizing that). The problem appeared only on eBPF dataplane since it does a fib lookup inside tc-ebpf before iptables reverse SNAT rules are called. On linux dataplane, iptables PREROUTING hooks are called before fib lookup happens.

It looks like that the current solution will not work for multi-node clusters. On a single node cluster, the external server knows definitively where to route the packets with egress SNAT IP as destination back to. But on multi-node clusters, the operator will have to choose a node to assign the egress SNAT IP to, and also make sure all traffic from pod/workloads are routed via this "chosen" node.