[BUG] Cilium and VPC-DNS

CiraciNicolo commented 5 months ago

Kube-OVN Version

1.12.17

Kubernetes Version

v1.29.5+k3s1

Operation-system/Kernel Version

"Ubuntu 22.04.4 LTS" Linux host 5.15.0-112-generic #122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Description

Using KubeOVN in chaining with Cilium, therefore with KubeOVN's LB disabled renders the vpn-dns feature unusable, since it is explicitly disabled here

Steps To Reproduce

Install Cilium and then KubeOVN as reported here
Create a VPC
Enable VPC-DNS

Current Behavior

VPC-DNS is not working since OVN LB is disabled and VPC-DNS are not scheduled, re-enabling LB will schedule the pods but SVC are not reachable with VIP ip

Expected Behavior

VPC-DNS should work

CiraciNicolo commented 5 months ago

For the sake of completeness, these are the configuration of both cilium and kubeovn. The cluster is a single k3s node, since this is a POC.

K3S systemd

ExecStart=/usr/local/bin/k3s \
server \
--disable=servicelb \
--disable=traefik \
--disable=metrics-server \
--flannel-backend=none \
--disable-kube-proxy \
--disable-network-policy \
--disable-helm-controller \
--disable-cloud-controller \
--cluster-cidr=10.69.0.0/16 \
--service-cidr=10.96.0.0/12 \

Chaining configuration

---
apiVersion: v1
kind: ConfigMap
metadata:
name: cni-configuration
data:
cni-config: |-
{
  "name": "generic-veth",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "kube-ovn",
      "server_socket": "/run/openvswitch/kube-ovn-daemon.sock"
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {
        "portMappings": true
      }
    },
    {
      "type": "cilium-cni",
      "chaining-mode": "generic-veth"
    }
  ]
}

Cilium helm values

cluster:
name: root
id: 0
cni:
chainingMode: generic-veth
chainingTarget: kube-ovn
customConf: true
configMap: cni-configuration
devices: "eth+ ovn0" ## https://github.com/kubeovn/kube-ovn/issues/4089#issue-2317593927
enableIPv4Masquerade: false
enableIdentityMark: false
kubeProxyReplacement: true
hubble:
relay:
enabled: true
ui:
enabled: true
ipam:
mode: cluster-pool
operator:
clusterPoolIPv4PodCIDRList: 10.69.0.0/16
ipv4:
enabled: true
ipv6:
enabled: false
k8sServiceHost: 172.16.150.111
k8sServicePort: 6443
operator:
replicas: 1
routingMode: "native"
sessionAffinity: true
socketLB:
hostNamespaceOnly: true ## https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#socket-loadbalancer-bypass-in-pod-namespace
version: 1.15.6

KubeOVN helm values

global:
registry:
address: docker.elmec.com/proxy-cache/kubeovn
images:
kubeovn:
  tag: v1.12.17
cni_conf:
CNI_CONFIG_PRIORITY: "10"
func:
ENABLE_NP: false
ENABLE_TPROXY: true
ipv4:
POD_CIDR: "10.69.0.0/16"
POD_GATEWAY: "10.69.0.1"
SVC_CIDR: "10.96.0.0/12"
JOIN_CIDR: "100.69.0.0/16"
PINGER_EXTERNAL_ADDRESS: "1.1.1.1"

CiraciNicolo commented 5 months ago

LB is created as output of ovn-nbctl lb-list

61dd9fec-032a-4a32-a7b6-d3959c688652    vpc-alpha-tcp-lo    tcp        10.96.0.10:53           10.100.0.2:53
                                                            tcp        10.96.0.10:9153         10.100.0.2:9153
2feecfab-8c8d-431c-857a-37ee4ea94085    vpc-alpha-udp-lo    udp        10.96.0.10:53           10.100.0.2:53

Also, I did not added a VPC NAT Gateway. Are GW needed for VPC DNS?

bobz965 commented 5 months ago

VPC DNS does not need VPC NAT Gateway. VPC DNS runs its coredns deployment like the way of VPC NAT Gateway.

please refer the doc: https://kubeovn.github.io/docs/v1.13.x/en/advance/vpc-internal-dns/?h=vpc

CiraciNicolo commented 5 months ago

Hi! OK thanks for the clarification about NATGW. Anyway I cannot resolve DNS inside the VPC:

Simple DNS resolution

root@c4i-bastion:/home/ubuntu# kubectl get pod -n alpha dnsutils -o wide
NAME       READY   STATUS    RESTARTS   AGE     IP           NODE          NOMINATED NODE   READINESS GATES
dnsutils   1/1     Running   0          3m54s   10.100.0.6   c4i-bastion   <none>           <none>
root@c4i-bastion:/home/ubuntu# kubectl get slr
NAME            VIP          PORT(S)                  SERVICE                         AGE
vpc-dns-alpha   10.96.0.10   53/UDP,53/TCP,9153/TCP   kube-system/slr-vpc-dns-alpha   15h
root@c4i-bastion:/home/ubuntu# kubectl exec -tn alpha dnsutils -- nslookup kubernetes.default.svc.cluster.local 10.96.0.10
;; connection timed out; no servers could be reached

command terminated with exit code 1

TPCDUMP

root@c4i-bastion:~# tcpdump -i any host 10.100.0.6
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
09:19:34.441612 f57883dadcf1_h P   IP 10.100.0.6.60218 > 10.96.0.10.domain: 23493+ A? kubernetes.default.svc.cluster.local.alpha.svc.cluster.local. (78)
09:19:39.441801 f57883dadcf1_h P   IP 10.100.0.6.60218 > 10.96.0.10.domain: 23493+ A? kubernetes.default.svc.cluster.local.alpha.svc.cluster.local. (78)
09:19:39.645206 f57883dadcf1_h P   ARP, Request who-has 10.100.0.1 tell 10.100.0.6, length 28
09:19:39.645903 f57883dadcf1_h Out ARP, Reply 10.100.0.1 is-at 0a:90:08:c5:d5:d9 (oui Unknown), length 28
09:19:44.442086 f57883dadcf1_h P   IP 10.100.0.6.60218 > 10.96.0.10.domain: 23493+ A? kubernetes.default.svc.cluster.local.alpha.svc.cluster.local. (78)

CiraciNicolo commented 5 months ago

Inspecting the traffic with ovs-tcpdump I see that the communication go towards 10.69.0.3 that is the Pod IP of the standalone CoreDNS. So it seems that the SLR is not applied.

bobz965 commented 5 months ago

please attach your vpc dns configmap and the coredns deployment pod.

CiraciNicolo commented 5 months ago

I don't think the issue is the deployment, because if use nslookup specifying the ip of the VPC-DNS pod everything works fine:

root@c4i-bastion:/home/ubuntu# kubectl get pod -n kube-system vpc-dns-alpha-5b5c864c98-jnp2w -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP           NODE          NOMINATED NODE   READINESS GATES
vpc-dns-alpha-5b5c864c98-jnp2w   1/1     Running   0          6h50m   10.100.0.7   c4i-bastion   <none>           <none>
root@c4i-bastion:/home/ubuntu# kubectl exec -tn alpha dnsutils -- nslookup kubernetes.default.svc.cluster.local 10.100.0.7
Server:     10.100.0.7
Address:    10.100.0.7#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

root@c4i-bastion:/home/ubuntu# kubectl exec -tn alpha dnsutils -- nslookup google.it 10.100.0.7
Server:     10.100.0.7
Address:    10.100.0.7#53

Name:   google.it
Address: 142.250.180.131

Anyway, this is the VPC-DNS CR and VPC-DNS deployment:

root@c4i-bastion:/home/ubuntu# kubectl get vpc-dnses.kubeovn.io alpha -o yaml
apiVersion: kubeovn.io/v1
kind: VpcDns
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kubeovn.io/v1","kind":"VpcDns","metadata":{"annotations":{},"name":"alpha"},"spec":{"replicas":1,"subnet":"alpha-default","vpc":"alpha"}}
  creationTimestamp: "2024-06-21T08:04:24Z"
  generation: 1
  name: alpha
  resourceVersion: "62483"
  uid: 59819688-2c4a-4fe6-a5d0-c7a249fe0635
spec:
  replicas: 1
  subnet: alpha-default
  vpc: alpha
status:
  active: true
root@c4i-bastion:/home/ubuntu# kubectl get pod -n kube-system vpc-dns-alpha-5b5c864c98-jnp2w -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "generic-veth",
          "interface": "eth0",
          "ips": [
              "10.100.0.7"
          ],
          "mac": "8a:65:e3:36:1f:b4",
          "default": true,
          "dns": {},
          "gateway": [
              "10.100.0.1"
          ]
      },{
          "name": "default/ovn-nad",
          "interface": "net1",
          "ips": [
              "10.69.0.15"
          ],
          "mac": "6e:54:9e:49:7f:99",
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks: default/ovn-nad
    ovn-nad.default.ovn.kubernetes.io/allocated: "true"
    ovn-nad.default.ovn.kubernetes.io/cidr: 10.69.0.0/16
    ovn-nad.default.ovn.kubernetes.io/gateway: 10.69.0.1
    ovn-nad.default.ovn.kubernetes.io/ip_address: 10.69.0.15
    ovn-nad.default.ovn.kubernetes.io/logical_router: ovn-cluster
    ovn-nad.default.ovn.kubernetes.io/logical_switch: ovn-default
    ovn-nad.default.ovn.kubernetes.io/mac_address: 6e:54:9e:49:7f:99
    ovn-nad.default.ovn.kubernetes.io/pod_nic_type: veth-pair
    ovn-nad.default.ovn.kubernetes.io/routed: "true"
    ovn.kubernetes.io/allocated: "true"
    ovn.kubernetes.io/cidr: 10.100.0.0/24
    ovn.kubernetes.io/gateway: 10.100.0.1
    ovn.kubernetes.io/ip_address: 10.100.0.7
    ovn.kubernetes.io/logical_router: alpha
    ovn.kubernetes.io/logical_switch: alpha-default
    ovn.kubernetes.io/mac_address: 8a:65:e3:36:1f:b4
    ovn.kubernetes.io/pod_nic_type: veth-pair
    ovn.kubernetes.io/routed: "true"
  creationTimestamp: "2024-06-21T08:04:24Z"
  generateName: vpc-dns-alpha-5b5c864c98-
  labels:
    k8s-app: vpc-dns-alpha
    pod-template-hash: 5b5c864c98
  name: vpc-dns-alpha-5b5c864c98-jnp2w
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: vpc-dns-alpha-5b5c864c98
    uid: fa50730d-c305-4aee-b4fc-4cf992a82a28
  resourceVersion: "62526"
  uid: dd3eeeb2-80fd-4230-9d4e-eac5bed14d7a
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: k8s-app
              operator: In
              values:
              - vpc-dns-alpha
          topologyKey: kubernetes.io/hostname
        weight: 100
  containers:
  - args:
    - -conf
    - /etc/coredns/Corefile
    image: rancher/mirrored-coredns-coredns:1.10.1
    imagePullPolicy: IfNotPresent
    name: coredns
    ports:
    - containerPort: 53
      name: dns
      protocol: UDP
    - containerPort: 53
      name: dns-tcp
      protocol: TCP
    - containerPort: 9153
      name: metrics
      protocol: TCP
    resources:
      limits:
        memory: 170Mi
      requests:
        cpu: 100m
        memory: 70Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - NET_BIND_SERVICE
        drop:
        - all
      readOnlyRootFilesystem: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/coredns
      name: config-volume
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-njmws
      readOnly: true
  dnsPolicy: Default
  enableServiceLinks: true
  initContainers:
  - command:
    - sh
    - -c
    - ip -4 route add 10.96.0.1 via 10.69.0.1 dev net1;ip -4 route add 172.16.150.10
      via 10.69.0.1 dev net1;
    image: docker.elmec.com/proxy-cache/kubeovn/vpc-nat-gateway:v1.12.17
    imagePullPolicy: IfNotPresent
    name: init-route
    resources: {}
    securityContext:
      allowPrivilegeEscalation: true
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-njmws
      readOnly: true
  nodeName: c4i-bastion
  nodeSelector:
    kubernetes.io/os: linux
  preemptionPolicy: PreemptLowerPriority
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: vpc-dns
  serviceAccountName: vpc-dns
  terminationGracePeriodSeconds: 30
  tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      items:
      - key: Corefile
        path: Corefile
      name: vpc-dns-corefile
    name: config-volume
  - name: kube-api-access-njmws
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-06-21T08:04:27Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2024-06-21T08:04:27Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-06-21T08:04:28Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-06-21T08:04:28Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-06-21T08:04:24Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://9c5fdf5a195cda078cca2e1708e11911a4a51b8ad1d9f3d0be2c3347b8ea7827
    image: docker.io/rancher/mirrored-coredns-coredns:1.10.1
    imageID: docker.io/rancher/mirrored-coredns-coredns@sha256:a11fafae1f8037cbbd66c5afa40ba2423936b72b4fd50a7034a7e8b955163594
    lastState: {}
    name: coredns
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-06-21T08:04:27Z"
  hostIP: 172.16.150.111
  hostIPs:
  - ip: 172.16.150.111
  initContainerStatuses:
  - containerID: containerd://0467fff8b3547d4e928dee54161232b512cfb8112dd2d377e8c12198443c5fb4
    image: docker.elmec.com/proxy-cache/kubeovn/vpc-nat-gateway:v1.12.17
    imageID: docker.elmec.com/proxy-cache/kubeovn/vpc-nat-gateway@sha256:3065824836ae3d7d9e16f2265a23dfd983b9052b51acfb65e0a1b02c4a1e20a0
    lastState: {}
    name: init-route
    ready: true
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://0467fff8b3547d4e928dee54161232b512cfb8112dd2d377e8c12198443c5fb4
        exitCode: 0
        finishedAt: "2024-06-21T08:04:26Z"
        reason: Completed
        startedAt: "2024-06-21T08:04:26Z"
  phase: Running
  podIP: 10.100.0.7
  podIPs:
  - ip: 10.100.0.7
  qosClass: Burstable
  startTime: "2024-06-21T08:04:24Z"

bobz965 commented 5 months ago

in your info:


61dd9fec-032a-4a32-a7b6-d3959c688652    vpc-alpha-tcp-lo    tcp        10.96.0.10:53           10.100.0.2:53
                                                            tcp        10.96.0.10:9153         10.100.0.2:9153
2feecfab-8c8d-431c-857a-37ee4ea94085    vpc-alpha-udp-lo    udp        10.96.0.10:53           10.100.0.2:53

root@c4i-bastion:/home/ubuntu# kubectl get pod -n kube-system vpc-dns-alpha-5b5c864c98-jnp2w -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP           NODE          NOMINATED NODE   READINESS GATES
vpc-dns-alpha-5b5c864c98-jnp2w   1/1     Running   0          6h50m   10.100.0.7   c4i-bastion   <none>           <none>
root@c4i-bastion:/home/ubuntu# kubectl exec -tn alpha dnsutils -- nslookup kubernetes.default.svc.cluster.local 10.100.0.7
Server:     10.100.0.7
Address:    10.100.0.7#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

root@c4i-bastion:/home/ubuntu# kubectl exec -tn alpha dnsutils -- nslookup google.it 10.100.0.7
Server:     10.100.0.7
Address:    10.100.0.7#53

Name:   google.it
Address: 142.250.180.131

is vpc dns deployment pod IP is 10.100.0.7 ?


in your custom vpc

10.96.0.10:53           10.100.0.2:53  should be  10.96.0.10:53           10.100.0.7:53  ????

CiraciNicolo commented 5 months ago

Yes, the VPC DNS pod is addressed at 10.100.0.7. I've no idea what happened, but now the LB is correct but still no dns resolution:

61dd9fec-032a-4a32-a7b6-d3959c688652    vpc-alpha-tcp-lo    tcp        10.96.0.10:53           10.100.0.7:53
                                                            tcp        10.96.0.10:9153         10.100.0.7:9153
2feecfab-8c8d-431c-857a-37ee4ea94085    vpc-alpha-udp-lo    udp        10.96.0.10:53           10.100.0.7:53

CiraciNicolo commented 5 months ago

Do you have any further advice? Load balancing for "normal" services works. As you can see I can spin up an nginx deployment and reach it via service. In the lb-list nginx svc is present.

root@c4i-bastion:/home/ubuntu# kubectl -n alpha get svc
NAME    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
nginx   ClusterIP   10.104.73.178   <none>        80/TCP    179m
root@c4i-bastion:/home/ubuntu# kubectl -n alpha get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP           NODE          NOMINATED NODE   READINESS GATES
dnsutils                 1/1     Running   0          3h10m   10.169.0.2   c4i-bastion   <none>           <none>
nginx-7854ff8877-mwztv   1/1     Running   0          179m    10.169.0.5   c4i-bastion   <none>           <none>
curl                     1/1     Running   0          121m    10.169.0.6   c4i-bastion   <none>           <none>
root@c4i-bastion:/home/ubuntu# kubectl -n kube-system get pod vpc-dns-alpha-5f8755bf9d-cqvzk -o wide
NAME                             READY   STATUS    RESTARTS   AGE   IP           NODE          NOMINATED NODE   READINESS GATES
vpc-dns-alpha-5f8755bf9d-cqvzk   1/1     Running   0          11m   10.169.0.7   c4i-bastion   <none>           <none>
root@c4i-bastion:/home/ubuntu# kubectl exec -itn alpha curl -- nslookup nginx.alpha.svc.cluster.local 10.169.0.7
Server:     10.169.0.7
Address:    10.169.0.7:53

Name:   nginx.alpha.svc.cluster.local
Address: 10.104.73.178

root@c4i-bastion:/home/ubuntu# kubectl exec -itn alpha curl -- curl 10.104.73.178:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@c4i-bastion:/home/ubuntu# kubectl ko nbctl lb-list
UUID                                    LB                  PROTO      VIP                     IPs
f206f992-dc25-4fad-be32-c89ff36676e9    cluster-tcp-load    tcp        10.100.69.65:6641       172.16.150.111:6641
                                                            tcp        10.101.48.228:443       172.16.150.111:4244
                                                            tcp        10.102.239.103:10665    172.16.150.111:10665
                                                            tcp        10.106.219.145:8080     10.69.0.4:8080
                                                            tcp        10.96.0.1:443           172.16.150.111:6443
                                                            tcp        10.96.166.21:80         10.69.0.14:4245
                                                            tcp        10.98.131.242:80        10.69.0.13:8081
                                                            tcp        10.99.43.117:10660      172.16.150.111:10660
                                                            tcp        10.99.7.58:6643         172.16.150.111:6643
                                                            tcp        10.99.73.136:10661      172.16.150.111:10661
                                                            tcp        10.99.94.62:6642        172.16.150.111:6642
4c311f4f-21ba-4605-a053-794f632a4b29    vpc-alpha-tcp-lo    tcp        10.104.73.178:80        10.169.0.5:80
                                                            tcp        10.96.0.10:53           10.169.0.7:53
                                                            tcp        10.96.0.10:9153         10.169.0.7:9153
da2dc500-a1b9-4f32-8de1-87cf714438b0    vpc-alpha-udp-lo    udp        10.96.0.10:53           10.169.0.7:53

bobz965 commented 5 months ago

are these lb work? inside the pod?

and how about these svc? inside the pod?

zhangzujian commented 5 months ago

ipam:
  operator:
    clusterPoolIPv4PodCIDRList: 10.69.0.0/16

clusterPoolIPv4PodCIDRList should be a different CIDR.

The default cluster/pod cidr in Kube-OVN is 10.16.0.0/16, and the default join CIDR is 100.64.0.0/16, so the clusterPoolIPv4PodCIDRList value used in Makefile#L834 is 100.65.0.0/16.

Please change this value and try again.

zhangzujian commented 5 months ago

FYI, I cannot reproduce this problem in master/v1.12.18. The DNS works well:

$ kubectl get subnet
NAME             PROVIDER              VPC           PROTOCOL   CIDR               PRIVATE   NAT     DEFAULT   GATEWAYTYPE   V4USED   V4AVAILABLE   V6USED   V6AVAILABLE   EXCLUDEIPS          U2OINTERCONNECTIONIP
join             ovn                   ovn-cluster   IPv4       100.64.0.0/16      false     false   false     distributed   1        65532         0        0             ["100.64.0.1"]
ovn-default      ovn                   ovn-cluster   IPv4       10.16.0.0/16       false     true    true      distributed   3        65530         0        0             ["10.16.0.1"]
s1               ovn                   vpc1          IPv4       99.99.99.0/24      false     false   false     distributed   2        251           0        0             ["99.99.99.1"]
vpc-dns-subnet   ovn-nad.default.ovn   ovn-cluster   IPv4       100.100.100.0/24   false     false   false     distributed   0        253           0        0             ["100.100.100.1"]
$ kubectl -n kube-system get pod vpc-dns-dns1-759b54bc4f-s9l6t -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP           NODE                     NOMINATED NODE   READINESS GATES
vpc-dns-dns1-759b54bc4f-s9l6t   1/1     Running   0          14m   99.99.99.4   kube-ovn-control-plane   <none>           <none>
$ kubectl get po -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP           NODE                     NOMINATED NODE   READINESS GATES
kubeovn-ksc5v   1/1     Running   0          15m   99.99.99.2   kube-ovn-control-plane   <none>           <none>
$ kubectl exec kubeovn-ksc5v -- nslookup kubernetes.default.svc.cluster.local. 99.99.99.4
Server:         99.99.99.4
Address:        99.99.99.4#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1
$ kubectl exec kubeovn-ksc5v -- nslookup kubernetes.default.svc.cluster.local. 10.96.0.3
Server:         10.96.0.3
Address:        10.96.0.3#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

CiraciNicolo commented 4 months ago

@zhangzujian Can you explain better why you created a new subnet for the VPC-DNS? It just for sake of example?

I replicated the documentation but now I'm getting network not ready after 9 ping to errors. What could be the problem? If I DO NOT use custom VPC, everything works.

zhangzujian commented 4 months ago

Can you explain better why you created a new subnet for the VPC-DNS? It just for sake of example?

It's just an example. It's ok to use the default subnet ovn-default.

I replicated the documentation but now I'm getting network not ready after 9 ping to errors.

Could you please provider more details about subnets, pod events and resources you created?

CiraciNicolo commented 4 months ago

You can find the manifests and command below:

---
apiVersion: v1
kind: Namespace
metadata:
  name: alpha
---
apiVersion: v1
kind: Namespace
metadata:
  name: beta
---
apiVersion: kubeovn.io/v1
kind: Vpc
metadata:
  name: alpha
spec:
  namespaces:
    - alpha
---
apiVersion: kubeovn.io/v1
kind: Vpc
metadata:
  name: beta
spec:
  namespaces:
    - beta
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: alpha
spec:
  vpc: alpha
  cidrBlock: 10.69.0.0/16
  protocol: IPv4
  namespaces:
    - alpha
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: beta
spec:
  vpc: beta
  cidrBlock: 10.69.0.0/16
  protocol: IPv4
  namespaces:
    - beta

Run a workload:

kubectl run ubuntu --image=docker.elmec.com/proxy-cache/library/ubuntu -n alpha -- sleep infinity

Errors:

kube-system/kube-ovn-cni-8s6fq[cni-server]: I0705 17:23:15.138774  201921 server.go:65] [2024-07-05T17:23:15+02:00] Incoming HTTP/1.1 POST /api/v1/add request
kube-system/kube-ovn-cni-8s6fq[cni-server]: I0705 17:23:15.138979  201921 handler.go:82] add port request: {kube-ovn ubuntu alpha 86f2fe9db8316a3c462878088dc6ec914996820a495aaec149654c9b326475fe /var/run/netns/cni-beaf7243-e810-8222-c0bf-bb8dc576e7f8 eth0 ovn [] {[]  [] []}    }
kube-system/kube-ovn-cni-8s6fq[cni-server]: I0705 17:23:15.150751  201921 handler.go:290] create container interface eth0 mac e2:75:a4:eb:f3:8b, ip 10.69.0.2/16, cidr 10.69.0.0/16, gw 10.69.0.1, custom routes []
kube-system/kube-ovn-cni-8s6fq[cni-server]: I0705 17:23:15.318371  201921 arp.go:213] announce arp address nic eth0 , ip 10.69.0.2, with mac e2:75:a4:eb:f3:8b
kube-system/kube-ovn-cni-8s6fq[cni-server]: I0705 17:23:15.341129  201921 ovs_linux.go:955] add ip address 10.69.0.2/16 to eth0
kube-system/kube-ovn-cni-8s6fq[cni-server]: W0705 17:23:18.342676  201921 ovs.go:34] 10.69.0.2 network not ready after 3 ping to gateway 10.69.0.1
kube-system/kube-ovn-cni-8s6fq[cni-server]: W0705 17:23:21.343015  201921 ovs.go:34] 10.69.0.2 network not ready after 6 ping to gateway 10.69.0.1
kube-system/kube-ovn-cni-8s6fq[cni-server]: W0705 17:23:24.342352  201921 ovs.go:34] 10.69.0.2 network not ready after 9 ping to gateway 10.69.0.1
kube-system/kube-ovn-cni-8s6fq[cni-server]: W0705 17:23:27.342242  201921 ovs.go:34] 10.69.0.2 network not ready after 12 ping to gateway 10.69.0.1

This happens in k3s with cilium chaining AND in kind without cilium chaining. The example subnet is the same as default vpc, but the issue still happens with another subnet. KubeOVN version 1.12.18 and K3S is v1.29.6+k3s1

zhangzujian commented 4 months ago

kube-system/kube-ovn-cni-8s6fq[cni-server]: W0705 17:23:18.342676 201921 ovs.go:34] 10.69.0.2 network not ready after 3 ping to gateway 10.69.0.1 kube-system/kube-ovn-cni-8s6fq[cni-server]: W0705 17:23:21.343015 201921 ovs.go:34] 10.69.0.2 network not ready after 6 ping to gateway 10.69.0.1 kube-system/kube-ovn-cni-8s6fq[cni-server]: W0705 17:23:24.342352 201921 ovs.go:34] 10.69.0.2 network not ready after 9 ping to gateway 10.69.0.1 kube-system/kube-ovn-cni-8s6fq[cni-server]: W0705 17:23:27.342242 201921 ovs.go:34] 10.69.0.2 network not ready after 12 ping to gateway 10.69.0.1

Thanks for the report. We will fix it as soon as possible.

zhangzujian commented 4 months ago

Seems it's because OVN does not support multiple LSP/LRP ports share the same name:

2024-07-05T15:46:55.693Z|00014|northd|WARN|duplicate logical port alpha-alpha
2024-07-05T15:46:56.721Z|00015|northd|WARN|duplicate logical router port beta-beta

For every subnet we have a LSP named \<subnet>-\<vpc> and a LRP named \<vpc>-\<subnet>. So you need to change the vpc name or change the subnet name.

CiraciNicolo commented 4 months ago

Thanks! Now everything works as expected, just and heads up: with Cilium is not possibile to have the same subnet in multiple VPC since Cilium internally manage endpoints and it refuse to track the same IP multiple times. I don't think this issue is fixable. BTW I will try to resume my testing for the VPC-DNS issue and let you know.

CiraciNicolo commented 4 months ago

Hi @zhangzujian, can you provide your manifests? I'm unable to replicate, if I create a VPC-DNS in a VPC subnet the coreDNS pod is unable to reach kube-apiserver. Probably I'm doing it wrong.

This is the error I'm getting in the VPC-DNS pod:

[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.96.0.1:443/version": dial tcp 10.96.0.1:443: i/o timeout

zhangzujian commented 4 months ago

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:vpc-dns
rules:
  - apiGroups:
    - ""
    resources:
    - endpoints
    - services
    - pods
    - namespaces
    verbs:
    - list
    - watch
  - apiGroups:
    - discovery.k8s.io
    resources:
    - endpointslices
    verbs:
    - list
    - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: vpc-dns
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:vpc-dns
subjects:
- kind: ServiceAccount
  name: vpc-dns
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vpc-dns
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-corefile
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          prefer_udp
        }
        cache 30
        loop
        reload
        loadbalance
    }
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ovn-nad
  namespace: default
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "kube-ovn",
      "server_socket": "/run/openvswitch/kube-ovn-daemon.sock",
      "provider": "ovn-nad.default.ovn"
    }'
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: vpc-dns-subnet
spec:
  protocol: IPv4
  cidrBlock: 100.100.100.0/24
  provider: ovn-nad.default.ovn
---
apiVersion: kubeovn.io/v1
kind: Vpc
metadata:
  name: vpc1
spec:
  namespaces:
  - default
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: s1
spec:
  protocol: IPv4
  cidrBlock: 99.99.99.0/24
  vpc: vpc1
  namespaces:
  - default
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-config
  namespace: kube-system
data:
  coredns-vip: 10.96.0.3
  enable-vpc-dns: "true"
  nad-name: ovn-nad
  nad-provider: ovn-nad.default.ovn

---
kind: VpcDns
apiVersion: kubeovn.io/v1
metadata:
  name: dns1
spec:
  vpc: vpc1
  subnet: s1
  replicas: 1

CiraciNicolo commented 4 months ago

This is the current deployment config of the VPC DNS pod:

k8s.v1.cni.cncf.io/network-status:
                        [{
                            "name": "generic-veth",
                            "interface": "eth0",
                            "ips": [
                                "10.112.0.10"
                            ],
                            "mac": "1a:e9:99:19:2a:87",
                            "default": true,
                            "dns": {},
                            "gateway": [
                                "10.112.0.1"
                            ]
                        },{
                            "name": "default/ovn-nad",
                            "interface": "net1",
                            "ips": [
                                "10.69.0.53"
                            ],
                            "mac": "4e:dc:6c:ee:d9:5c",
                            "dns": {}
                        }]
                      k8s.v1.cni.cncf.io/networks: default/ovn-nad
                      ovn-nad.default.ovn.kubernetes.io/allocated: true
                      ovn-nad.default.ovn.kubernetes.io/cidr: 10.69.0.0/16
                      ovn-nad.default.ovn.kubernetes.io/gateway: 10.69.0.1
                      ovn-nad.default.ovn.kubernetes.io/ip_address: 10.69.0.53
                      ovn-nad.default.ovn.kubernetes.io/logical_router: ovn-cluster
                      ovn-nad.default.ovn.kubernetes.io/logical_switch: ovn-default
                      ovn-nad.default.ovn.kubernetes.io/mac_address: 4e:dc:6c:ee:d9:5c
                      ovn-nad.default.ovn.kubernetes.io/pod_nic_type: veth-pair
                      ovn-nad.default.ovn.kubernetes.io/routed: true
                      ovn.kubernetes.io/allocated: true
                      ovn.kubernetes.io/cidr: 10.112.0.0/16
                      ovn.kubernetes.io/gateway: 10.112.0.1
                      ovn.kubernetes.io/ip_address: 10.112.0.10
                      ovn.kubernetes.io/logical_router: alpha-vpc
                      ovn.kubernetes.io/logical_switch: alpha-system-subnet
                      ovn.kubernetes.io/mac_address: 1a:e9:99:19:2a:87
                      ovn.kubernetes.io/pod_nic_type: veth-pair
                      ovn.kubernetes.io/routed: true
Status:               Running
IP:                   10.112.0.10

Frankly it seems kinda strange that the VPC DNS is attached to the ovn-default logical switch since I deployed the following Subnet:

---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: vpc-dns-subnet
spec:
  vpc: ovn-cluster
  cidrBlock: 10.70.0.0/16
  protocol: IPv4
  provider: ovn-nad.default.ovn

Also still unable to reach kube-apiserver.

zhangzujian commented 4 months ago

On the node where vpc dns pod is running, exec the following command:

# get pid of the coredns
pidof coredns
# enter the netns and get routes
nsenter -n -t <PID> ip route get 10.96.0.1

Check whether the nexthop/gateway is the gateway address of ovn-default/vpc-dns-subnet.

CiraciNicolo commented 4 months ago

Hi, it seems everything is configured correctly

root@c4i-bastion:~# kubectl -n alpha-system debug -it vpc-dns-867fccf66c-mb5br --image=docker.elmec.com/proxy-cache/curlimages/curl --target=coredns -- ip r get 10.96.0.1
Targeting container "coredns". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-xk9rg.
10.96.0.1 via 10.70.0.1 dev net1  src 10.70.0.12
root@c4i-bastion:~# kubectl -n alpha-system debug -it vpc-dns-867fccf66c-mb5br --image=docker.elmec.com/proxy-cache/curlimages/curl --target=coredns -- ip a
Targeting container "coredns". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-4p8r4.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
418: eth0@if419: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1400 qdisc noqueue state UP
    link/ether f2:f3:1a:e9:37:e4 brd ff:ff:ff:ff:ff:ff
    inet 10.112.0.15/16 brd 10.112.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f0f3:1aff:fee9:37e4/64 scope link
       valid_lft forever preferred_lft forever
420: net1@if421: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1400 qdisc noqueue state UP
    link/ether 7e:62:f1:66:8e:69 brd ff:ff:ff:ff:ff:ff
    inet 10.70.0.12/16 brd 10.70.255.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::7c62:f1ff:fe66:8e69/64 scope link
       valid_lft forever preferred_lft forever
root@c4i-bastion:~# kubectl get subnet vpc-dns-subnet -o jsonpath={.spec.gateway}
10.70.0.1root@c4i-bastion:~# nsenter -n -t 596736 ip r get 10.96.0.1
10.96.0.1 via 10.70.0.1 dev net1 src 10.70.0.12 uid 0
    cache

zhangzujian commented 4 months ago

How did you install cilium? Please provide the helm values.

CiraciNicolo commented 4 months ago

I had an epiphany, the issue is about K3S. The kubernetes Service in K3S is "just an alias" to the underlining host, therefore when I was resolving 10.96.0.1 I was indirectly trying to access the host (at 172.16.150.111) but no GW was allowing that. To fix I just added an indirect route 172.16.150.111 via 10.70.0.1 dev net1. Now pointing directly to the vpc-dns pods works, no luck with SwitchLBRule.

Anyway this is my Cilium configuration:

---
cluster:
  name: root
  id: 0
cni:
  chainingMode: generic-veth
  chainingTarget: kube-ovn
  customConf: true
  configMap: cni-configuration
devices: "eth+ ovn0" ## https://github.com/kubeovn/kube-ovn/issues/4089#issue-2317593927
enableIPv4Masquerade: false
enableIdentityMark: false
kubeProxyReplacement: true
hubble:
  relay:
    enabled: true
  ui:
    enabled: true
ipam:
  mode: cluster-pool
  operator:
    clusterPoolIPv4PodCIDRList: 100.70.0.0/16
ipv4:
  enabled: true
ipv6:
  enabled: false
k8sServiceHost: 172.16.150.111
k8sServicePort: 6443
operator:
  replicas: 1
routingMode: "native"
sessionAffinity: true
sctp:
  enabled: true
version: 1.15.6

I'll explore another workaround, but given the test we did I assume that there is no way to colocate Cilium SVC LB and KubeOVS's SwitchLBRule.

zhangzujian commented 4 months ago

socketLB:
  hostNamespaceOnly: true

Why did you remove this value? I think you can add it back and try again.

zhangzujian commented 4 months ago

~You can also disable socketLB by setting socketLB.enabled=false and enable ovn lb.~

CiraciNicolo commented 4 months ago

socketLB:
  hostNamespaceOnly: true
Why did you remove this value? I think you can add it back and try again.

Enabling this flag of Cilium is no go, I replicated with my own deployment the VPC DNS functionality and now it is working! I enabled logs into my own CoreDNS and now I see request going to this CoreDNS. In a nutshell I deployed "my own vpc dns" copying the deployment deployed by KubeOVN and I got it working with Cilium. Thanks for the support 😊

EDIT: ok, something broke again :/

CiraciNicolo commented 4 months ago

Another update: I'm trying to use CiliumLocalRedirectPolicy and it seems to work. Will update asap, I will do some test cases.

CiraciNicolo commented 4 months ago

Hi, after some troubleshooting the issue is that Cilium and KubeOVN competes for internal service load balancing. The easiest solution is to deploy some kind of policy, ex with Kyverno, to rewrite the pods dns config. This workaround is quite stable and does not mess with the network. Something like this works:

---
apiVersion: kyverno.io/v1
kind: Policy
metadata:
  name: rewrite-dns
  namespace: namespace
  annotations:
    policies.kyverno.io/title: Rewrite DNS
    policies.kyverno.io/category: Networking
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/minversion: 1.12.0
spec:
  rules:
  - name: rewrite-dns
    context:
    - name: dns
      apiCall:
        urlPath: /api/v1/namespaces/namespace/services/vpc-dns
    match:
      any:
      - resources:
          kinds:
          - Pod
    mutate:
      patchStrategicMerge:
        spec:
          dnsConfig:
            nameservers:
            - "{{ dns.spec.clusterIP }}"
            options:
            - name: ndots
              value: "5"
            searches:
            - "{{ request.namespace }}.svc.cluster.local"
            - svc.cluster.local
            - cluster.local
          dnsPolicy: None

Will leave this here for other people. Thanks for the support.

bobz965 commented 4 months ago

how about just using cilium lb to replace ipvs, and disable kubeovn Enable lb?

CiraciNicolo commented 4 months ago

I think that is out-of-scope for my use case. Furthermore, if I'm not wrong, disabling KubeOVN's LB will prevent the VPC DNS functionality.

kubeovn / kube-ovn