kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.99k stars 450 forks source link

[BUG] conflict, when two vpcs with different subnets, two nat gateways ( an external subnet ) #4566

Open cybercoder opened 1 month ago

cybercoder commented 1 month ago

Kube-OVN Version

v1.12.25

Kubernetes Version

v1.30.4+k3s1

Operation-system/Kernel Version

Ubuntu 20.04.6 LTS 5.4.0-196-generic

Description

according to this documentation, a custom VPC with a nat gateway works properly.

but EIP, DNAT get conflict, when the second VPC and it's NAT GW starts running.

Steps To Reproduce

Create External (Non-OVN):

apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: external1
spec:
  protocol: IPv4
  provider: external1.kube-system
  cidrBlock: 172.17.88.0/24
  gateway: 172.17.88.1  # IP address of the physical gateway
  excludeIps:
  - 172.17.88.1..172.17.88.100
---
# multus thin
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: external1
  namespace: kube-system
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "macvlan",
      "master": "ens224",
      "mode": "bridge",
      "ipam": {
        "type": "kube-ovn",
        "server_socket": "/run/openvswitch/kube-ovn-daemon.sock",
        "provider": "external1.kube-system"
      }
    }'
---

VPC, Internal subnet, Nat Gateway:

kind: Vpc
apiVersion: kubeovn.io/v1
metadata:
  name: roya-vpc-1
spec:
  namespaces:
  - roya
---
kind: Subnet
apiVersion: kubeovn.io/v1
metadata:
  name: roya-subnet1
spec:
  vpc: roya-vpc-1
  cidrBlock: 10.0.1.0/24
  gateway: 10.0.1.254
  protocol: IPv4
  namespaces:
    - roya
---
kind: VpcNatGateway
apiVersion: kubeovn.io/v1
metadata:
  name: roya-gw
spec:
  vpc: roya-vpc-1
  subnet: roya-subnet1
  lanIp: 10.0.1.254
  externalSubnets:
    - external1

Now the EIP and the POD:

kind: IptablesEIP
apiVersion: kubeovn.io/v1
metadata:
  name: roya-alpine-eip
spec:
  natGwDp: roya-gw
  externalSubnet: external1
---
apiVersion: v1
kind: Pod
metadata:
  name: nginxalpine
  namespace: roya
  annotations:
    ovn.kubernetes.io/logical_router: roya-vpc-1
    ovn.kubernetes.io/logical_switch: roya-subnet1
    ovn.kubernetes.io/snat: roya-alpine-eip # seems doesn't work with custom VPC
    ovn.kubernetes.io/eip: roya-alpine-eip # seems doesn't work with custom VPC
spec:
  containers:
  - name: alpine
    image: nginx:alpine

It needs a custom SNAT/DNAT (seems eip,snat annotations doesn't work like default VPC):

kind: IptablesDnatRule
apiVersion: kubeovn.io/v1
metadata:
  name: roya-nginxalpine-dnat
spec:
  eip: roya-alpine-eip
  internalIp: 10.0.1.2
  externalPort: "80"
  protocol: tcp
  internalPort: "80"
---
kind: IptablesSnatRule
apiVersion: kubeovn.io/v1
metadata:
  name: roya-nginxalpine-snat
spec:
  eip: roya-alpine-eip
  internalCIDR: 10.0.1.0/24

Now, it works and curl on eip shows the Nginx page.

Current Behavior

When we clone that configs ( except external subnet and NAD ),

The CURL getting timeout on the first POD ( DNAT or GWs seems conflict )

Expected Behavior

Real isolation.

dosubot[bot] commented 1 month ago

Hey there, @cybercoder! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!

I found two similar closed issues that might be relevant to your problem:

  1. VPC网关的DNAT和SNAT无法使用同一个EIP - This issue was closed by a pull request: Iptables nat support share eip on May 16, 2023 [1].
  2. Use lr-policy to implement eip and snat - This issue was referenced by another issue: Pod with SNAT or EIP Enabled Unable Connect to kube-dns [2].

To resolve the conflict between EIP and DNAT when running a second VPC with a NAT gateway in Kube-OVN v1.12.25 on Kubernetes v1.30.4+k3s1, you need to ensure that the EIP (Elastic IP) is not in use by any DNAT (Destination NAT) rules before proceeding. The provided code includes validation hooks that check for such conflicts.

Here is the relevant part of the code that handles the validation for DNAT rules:

func (v *ValidatingHook) ValidateIptablesDnat(ctx context.Context, dnat *ovnv1.IptablesDnatRule) error {
    if dnat.Spec.EIP == "" {
        return errors.New("parameter \"eip\" cannot be empty")
    }
    eip := &ovnv1.IptablesEIP{}
    key := types.NamespacedName{Name: dnat.Spec.EIP}
    if err := v.cache.Get(ctx, key, eip); err != nil {
        return err
    }

    if dnat.Spec.ExternalPort == "" {
        return errors.New("parameter \"externalPort\" cannot be empty")
    }

    if dnat.Spec.InternalPort == "" {
        return errors.New("parameter \"internalPort\" cannot be empty")
    }

    if port, err := strconv.Atoi(dnat.Spec.ExternalPort); err != nil {
        errMsg := fmt.Errorf("failed to parse externalPort %s: %w", dnat.Spec.ExternalPort, err)
        return errMsg
    } else if port < 0 || port > 65535 {
        err := fmt.Errorf("externalPort %s is not a valid port", dnat.Spec.ExternalPort)
        return err
    }

    if port, err := strconv.Atoi(dnat.Spec.InternalPort); err != nil {
        errMsg := fmt.Errorf("failed to parse internalIP %s: %w", dnat.Spec.InternalPort, err)
        return errMsg
    } else if port < 0 || port > 65535 {
        err := fmt.Errorf("internalIP %s is not a valid port", dnat.Spec.InternalPort)
        return err
    }

    if net.ParseIP(dnat.Spec.InternalIP) == nil {
        err := fmt.Errorf("internalIP %s is not a valid ip", dnat.Spec.InternalIP)
        return err
    }

    if !strings.EqualFold(dnat.Spec.Protocol, "tcp") &&
        !strings.EqualFold(dnat.Spec.Protocol, "udp") {
        err := fmt.Errorf("invalid iptable protocol: %s,supported params: \"tcp\", \"udp\"", dnat.Spec.Protocol)
        return err
    }

    return nil
}

This function validates the DNAT rule by checking if the EIP is specified and if it is valid. It also ensures that the external and internal ports are within the valid range and that the internal IP is a valid IP address. Additionally, it checks that the protocol is either TCP or UDP [3].

To resolve the conflict, you should ensure that:

  1. The EIP specified in the DNAT rule is not already in use by another DNAT rule.
  2. The EIP is correctly configured and not empty.
  3. The ports and IP addresses specified in the DNAT rule are valid.

Additionally, you can use the provided nat-gateway.sh script to manage your NAT gateway configurations. This script includes functions to add and delete EIPs, DNAT, and SNAT rules, ensuring that the NAT gateway is properly initialized and configured. For example, to add an EIP, you can use the add_eip function, and to add a DNAT rule, you can use the add_dnat function. This script helps automate the setup and management of NAT rules, reducing the risk of conflicts [4].

Here is an example of how to use the script to add an EIP and a DNAT rule:

# Initialize the NAT gateway
./nat-gateway.sh init

# Add an EIP
./nat-gateway.sh eip-add 192.168.1.100/24

# Add a DNAT rule
./nat-gateway.sh dnat-add 192.168.1.100,10.0.0.1,80,8080,tcp

Furthermore, the test/e2e/iptables-vpc-nat-gw/e2e_test.go file in Kube-OVN provides additional context on how the VPC NAT Gateway is managed. The setupVpcNatGwTestEnvironment function ensures that the VPC NAT Gateway is properly created or updated, and the cleanVpcNatGwTestEnvironment function cleans up the NAT Gateway, including setting up QoS policies if specified [5].

By following these steps and using the provided script, you can effectively manage your NAT gateway and resolve conflicts between EIP and DNAT.

To continue talking to Dosu, mention @dosu.

zhangzujian commented 1 month ago
kind: Subnet
apiVersion: kubeovn.io/v1
metadata:
  name: roya-subnet1
spec:
  vpc: roya-vpc-1
  cidrBlock: 10.0.1.0/24
  gateway: 10.0.1.254
  protocol: IPv4
  namespaces:
    - roya
---
kind: VpcNatGateway
apiVersion: kubeovn.io/v1
metadata:
  name: roya-gw
spec:
  vpc: roya-vpc-1
  subnet: roya-subnet1
  lanIp: 10.0.1.254
  externalSubnets:
    - external1

Seems field .spec.lanIp of the vpc nat gateway is conflict with the subnet gateway. Please change the gateway lanIp or subnet gateway and try again.