kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.87k stars 433 forks source link

[BUG] Custom VPC-DNS not working at VM (kubevirt) #4250

Open reski-rukmantiyo opened 3 days ago

reski-rukmantiyo commented 3 days ago

Kube-OVN Version

v1.12.12

Kubernetes Version

Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.0

Operation-system/Kernel Version

"Ubuntu 22.04.4 LTS" 5.15.0-113-generic

Description

Right now, I can create isolated workload in Pod by using Subnet, VPC and Nat Gateway. And thru VPC-DNS, my pod can reach domain name.

But somehow in KubeVirt VM, there are two problems

Steps To Reproduce

  1. Create Kubevirt VM using script in Config VPC
  2. Cannot check for DNS

Current Behavior

Default DNS not working inside VM.

ubuntu@devspace-vm:~/dekagpu-installation/$ k get svc -o wide -A|grep dns
kube-system   kube-dns                      ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   43h   k8s-app=kube-dns
kube-system   slr-vpc-dns-dns-net1-ns1      ClusterIP   None             <none>        53/UDP,53/TCP,9153/TCP   42h   k8s-app=vpc-dns-dns-net1-ns1

When I try to solve from existing VM

ubuntu@ubuntu:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp1s0)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.96.0.10
       DNS Servers: 10.96.0.10
        DNS Domain: cluster.local devspace.svc.cluster.local
                    ns1.svc.cluster.local svc.cluster.local

DNS already correct but somehow cannot reach DNS server

ubuntu@ubuntu:~$ ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
^C
--- 10.96.0.10 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1018ms

ubuntu@ubuntu:~$ telnet 10.96.0.10 53
Trying 10.96.0.10...
telnet: Unable to connect to remote host: No route to host
ubuntu@ubuntu:~$ 

Expected Behavior

DNS should be working, maybe I missed something

zhangzujian commented 3 days ago

There are two mistakes:

  1. 10.96.0.10 is a service ip, you should never ping a service ip to check if the service is reachable;
  2. You should use the vpc-dns vip defined in configmap vpc-dns-config, not 10.96.0.10.
reski-rukmantiyo commented 3 days ago

There are two mistakes:

  1. 10.96.0.10 is a service ip, you should never ping a service ip to check if the service is reachable;
  2. You should use the vpc-dns vip defined in configmap vpc-dns-config, not 10.96.0.10.

Hi @zhangzujian,

This is my vpc-dns-config configmap.

##
# coredns-vip is the IP address of the CoreDNS service.
# IP can be changes
##

apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-config
  namespace: kube-system
data:
  coredns-vip: 10.96.0.10
  enable-vpc-dns: "true"
  nad-name: ovn-nad
  nad-provider: ovn-nad.default.ovn

Maybe I miss something here as well, but how to create VIP of coredns? Like using LoadBalancer in services?

zhangzujian commented 2 days ago

You should use another ip address, e.g. 10.96.0.3.

reski-rukmantiyo commented 2 days ago

my DNS at 10.96.0.10 ...my current ServiceIP for CoreDNS

ubuntu@ubuntu:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp1s0)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.96.0.10
       DNS Servers: 10.96.0.10
        DNS Domain: cluster.local devspace.svc.cluster.local
                    ns1.svc.cluster.local svc.cluster.local

If i change into 10.96.0.5,

##
# coredns-vip is the IP address of the CoreDNS service.
# IP can be changes
##

apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-config
  namespace: kube-system
data:
  coredns-vip: 10.96.0.5
  enable-vpc-dns: "true"
  nad-name: ovn-nad
  nad-provider: ovn-nad.default.ovn

No response...

ubuntu@devspace-vm:~/dekagpu-installation/$ k exec -it pod/vpc1-pod -n ns1 -- ping www.google.com
ping: bad address 'www.google.com'
command terminated with exit code 1
bobz965 commented 2 days ago

how about nc -u -zv 10.96.0.5 53 ?

reski-rukmantiyo commented 2 days ago

This is my result

ubuntu@devspace-vm:~/dekagpu-installation/$ k exec -it pod/vpc1-pod -n ns1 -- nc -u -zv 10.96.0.5 53
10.96.0.5 (10.96.0.5:53) open
ubuntu@devspace-vm:~/dekagpu-installation/$ k exec -it pod/vpc1-pod -n ns1 -- ping www.google.com
ping: bad address 'www.google.com'
command terminated with exit code 1

and my config

##
# coredns-vip is the IP address of the CoreDNS service.
# IP can be changes
##

apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-config
  namespace: kube-system
data:
  coredns-vip: 10.96.0.5
  enable-vpc-dns: "true"
  nad-name: ovn-nad
  nad-provider: ovn-nad.default.ovn
zhangzujian commented 8 hours ago

Please check whether the vpc dns works:

k exec -it pod/vpc1-pod -n ns1 -- nslookup kubernetes.default.svc.cluster.local. 10.96.0.5

If it works for vpc pods but does not work for the vpc vm, the problem may be related to kubevirt. Check your route in the vm:

ip route get 10.96.0.5
nc -v -z -w1 10.96.0.5 53
reski-rukmantiyo commented 8 hours ago

Hi @zhangzujian

This is my config

apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-config
  namespace: kube-system
data:
  coredns-vip: 10.96.0.5
  enable-vpc-dns: "true"
  nad-name: ovn-nad
  nad-provider: ovn-nad.default.ovn

Results

ubuntu@ubuntu:~$ ubuntu@devspace-vm:~/dekagpu-installation/$ k exec -it pod/vpc1-pod -n ns1 -- nslookup kubernetes.default.svc.cluster.local. 10.96.0.5
Server:         10.96.0.5
Address:        10.96.0.5:53

** server can't find kubernetes.default.svc.cluster.local.: REFUSED

** server can't find kubernetes.default.svc.cluster.local.: REFUSED

command terminated with exit code 1

in VM

ubuntu@ubuntu:~$ ip route get 10.96.0.5
RTNETLINK answers: Network is unreachable

ubuntu@ubuntu:~$ ip route get 10.96.0.10
10.96.0.10 dev enp1s0 src 10.0.1.19 uid 1000 
    cache 

and

ubuntu@ubuntu:~$ nc -v -z -w1 10.96.0.10 54
nc: connect to 10.96.0.10 port 54 (tcp) timed out: Operation now in progress

ubuntu@ubuntu:~$ nc -v -z -w1 10.96.0.5 54
nc: connect to 10.96.0.5 port 54 (tcp) failed: Network is unreachable
zhangzujian commented 8 hours ago

ubuntu@ubuntu:~$ ip route get 10.96.0.5 RTNETLINK answers: Network is unreachable

Check routes in the VM by:

ip addr show
ip route show
zhangzujian commented 8 hours ago

nc -v -z -w1 10.96.0.5 54

The port should be 53.

Did you change the dns vip in the vpc-dns-config? If so, please try to turn off the vpc dns and enable it again.

zhangzujian commented 8 hours ago

Run the following command to check ovn status:

kubectl ko nbctl lr-route-list vpc1
kubectl ko nbctl ls-lb-list <vpc-subnet-name>
kubectl ko trace ns1/vpc1-pod 10.96.0.5 udp 53