[BUG] Custom VPC-DNS not working at VM (kubevirt)

reski-rukmantiyo commented 3 days ago

Kube-OVN Version

v1.12.12

Kubernetes Version

Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.0

Operation-system/Kernel Version

"Ubuntu 22.04.4 LTS" 5.15.0-113-generic

Description

Right now, I can create isolated workload in Pod by using Subnet, VPC and Nat Gateway. And thru VPC-DNS, my pod can reach domain name.

But somehow in KubeVirt VM, there are two problems

Route: somehow route cannot go to 10.0.1.254 automatically, i have to add somekind of route inside the vm thru cloud-init (solveable)
DNS : somehow in the vm cannot reach vpc-dns. In my case cannot reach dns pod

Steps To Reproduce

Create Kubevirt VM using script in Config VPC
Cannot check for DNS

Current Behavior

Default DNS not working inside VM.

ubuntu@devspace-vm:~/dekagpu-installation/$ k get svc -o wide -A|grep dns
kube-system   kube-dns                      ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   43h   k8s-app=kube-dns
kube-system   slr-vpc-dns-dns-net1-ns1      ClusterIP   None             <none>        53/UDP,53/TCP,9153/TCP   42h   k8s-app=vpc-dns-dns-net1-ns1

When I try to solve from existing VM

ubuntu@ubuntu:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp1s0)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.96.0.10
       DNS Servers: 10.96.0.10
        DNS Domain: cluster.local devspace.svc.cluster.local
                    ns1.svc.cluster.local svc.cluster.local

DNS already correct but somehow cannot reach DNS server

ubuntu@ubuntu:~$ ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
^C
--- 10.96.0.10 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1018ms

ubuntu@ubuntu:~$ telnet 10.96.0.10 53
Trying 10.96.0.10...
telnet: Unable to connect to remote host: No route to host
ubuntu@ubuntu:~$

Expected Behavior

DNS should be working, maybe I missed something

zhangzujian commented 3 days ago

There are two mistakes:

10.96.0.10 is a service ip, you should never ping a service ip to check if the service is reachable;
You should use the vpc-dns vip defined in configmap vpc-dns-config, not 10.96.0.10.

reski-rukmantiyo commented 3 days ago

There are two mistakes:

10.96.0.10 is a service ip, you should never ping a service ip to check if the service is reachable;

You should use the vpc-dns vip defined in configmap vpc-dns-config, not 10.96.0.10.

Hi @zhangzujian,

This is my vpc-dns-config configmap.

##
# coredns-vip is the IP address of the CoreDNS service.
# IP can be changes
##

apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-config
  namespace: kube-system
data:
  coredns-vip: 10.96.0.10
  enable-vpc-dns: "true"
  nad-name: ovn-nad
  nad-provider: ovn-nad.default.ovn

Maybe I miss something here as well, but how to create VIP of coredns? Like using LoadBalancer in services?

zhangzujian commented 2 days ago

You should use another ip address, e.g. 10.96.0.3.

reski-rukmantiyo commented 2 days ago

my DNS at 10.96.0.10 ...my current ServiceIP for CoreDNS

ubuntu@ubuntu:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp1s0)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.96.0.10
       DNS Servers: 10.96.0.10
        DNS Domain: cluster.local devspace.svc.cluster.local
                    ns1.svc.cluster.local svc.cluster.local

If i change into 10.96.0.5,

##
# coredns-vip is the IP address of the CoreDNS service.
# IP can be changes
##

apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-config
  namespace: kube-system
data:
  coredns-vip: 10.96.0.5
  enable-vpc-dns: "true"
  nad-name: ovn-nad
  nad-provider: ovn-nad.default.ovn

No response...

ubuntu@devspace-vm:~/dekagpu-installation/$ k exec -it pod/vpc1-pod -n ns1 -- ping www.google.com
ping: bad address 'www.google.com'
command terminated with exit code 1

bobz965 commented 2 days ago

how about nc -u -zv 10.96.0.5 53 ？

reski-rukmantiyo commented 2 days ago

This is my result

ubuntu@devspace-vm:~/dekagpu-installation/$ k exec -it pod/vpc1-pod -n ns1 -- nc -u -zv 10.96.0.5 53
10.96.0.5 (10.96.0.5:53) open
ubuntu@devspace-vm:~/dekagpu-installation/$ k exec -it pod/vpc1-pod -n ns1 -- ping www.google.com
ping: bad address 'www.google.com'
command terminated with exit code 1

and my config

##
# coredns-vip is the IP address of the CoreDNS service.
# IP can be changes
##

apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-config
  namespace: kube-system
data:
  coredns-vip: 10.96.0.5
  enable-vpc-dns: "true"
  nad-name: ovn-nad
  nad-provider: ovn-nad.default.ovn

zhangzujian commented 8 hours ago

Please check whether the vpc dns works:

k exec -it pod/vpc1-pod -n ns1 -- nslookup kubernetes.default.svc.cluster.local. 10.96.0.5

If it works for vpc pods but does not work for the vpc vm, the problem may be related to kubevirt. Check your route in the vm:

ip route get 10.96.0.5
nc -v -z -w1 10.96.0.5 53

reski-rukmantiyo commented 8 hours ago

Hi @zhangzujian

This is my config

apiVersion: v1
kind: ConfigMap
metadata:
  name: vpc-dns-config
  namespace: kube-system
data:
  coredns-vip: 10.96.0.5
  enable-vpc-dns: "true"
  nad-name: ovn-nad
  nad-provider: ovn-nad.default.ovn

Results

ubuntu@ubuntu:~$ ubuntu@devspace-vm:~/dekagpu-installation/$ k exec -it pod/vpc1-pod -n ns1 -- nslookup kubernetes.default.svc.cluster.local. 10.96.0.5
Server:         10.96.0.5
Address:        10.96.0.5:53

** server can't find kubernetes.default.svc.cluster.local.: REFUSED

** server can't find kubernetes.default.svc.cluster.local.: REFUSED

command terminated with exit code 1

in VM

ubuntu@ubuntu:~$ ip route get 10.96.0.5
RTNETLINK answers: Network is unreachable

ubuntu@ubuntu:~$ ip route get 10.96.0.10
10.96.0.10 dev enp1s0 src 10.0.1.19 uid 1000 
    cache

and

ubuntu@ubuntu:~$ nc -v -z -w1 10.96.0.10 54
nc: connect to 10.96.0.10 port 54 (tcp) timed out: Operation now in progress

ubuntu@ubuntu:~$ nc -v -z -w1 10.96.0.5 54
nc: connect to 10.96.0.5 port 54 (tcp) failed: Network is unreachable

zhangzujian commented 8 hours ago

ubuntu@ubuntu:~$ ip route get 10.96.0.5 RTNETLINK answers: Network is unreachable

Check routes in the VM by:

ip addr show
ip route show

zhangzujian commented 8 hours ago

nc -v -z -w1 10.96.0.5 54

The port should be 53.

Did you change the dns vip in the vpc-dns-config? If so, please try to turn off the vpc dns and enable it again.

zhangzujian commented 8 hours ago

Run the following command to check ovn status:

kubectl ko nbctl lr-route-list vpc1
kubectl ko nbctl ls-lb-list <vpc-subnet-name>
kubectl ko trace ns1/vpc1-pod 10.96.0.5 udp 53

kubeovn / kube-ovn