Open a180285 opened 6 months ago
thanks for the info! i will fix it.
hi @a180285 , please attach your client pod
access the server dnat eip pod
telnet or curl.
and the yaml of the pods,
@bobz965 all pod info is in above e2e test file. And I do os.exit(1)
in afterEach. It will keep the debug env in kind k8s.
My Server pod image is nginx Client pod image is network tools image. I have removed the unrelated test cases in that file. It only contain this case in my branch. You should be able to reproduce by run the e2e test file I provided.
Here is the nginx server pod yaml.
apiVersion: v1
kind: Pod
metadata:
name: fip-pod-195474116
namespace: ovn-vpc-nat-gw-1365
uid: ee298dde-079e-4064-b62c-5a78524243f9
resourceVersion: '1290'
creationTimestamp: '2024-05-16T14:48:47Z'
annotations:
ovn.kubernetes.io/allocated: 'true'
ovn.kubernetes.io/cidr: 192.168.0.0/24
ovn.kubernetes.io/gateway: 192.168.0.1
ovn.kubernetes.io/ip_address: 192.168.0.4
ovn.kubernetes.io/logical_router: no-bfd-vpc-194872958
ovn.kubernetes.io/logical_switch: no-bfd-subnet-108149166
ovn.kubernetes.io/mac_address: 00:00:00:B2:AE:08
ovn.kubernetes.io/pod_nic_type: veth-pair
ovn.kubernetes.io/routed: 'true'
managedFields:
- manager: kube-ovn-controller
operation: Update
apiVersion: v1
time: '2024-05-16T14:48:47Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:ovn.kubernetes.io/allocated: {}
f:ovn.kubernetes.io/cidr: {}
f:ovn.kubernetes.io/gateway: {}
f:ovn.kubernetes.io/ip_address: {}
f:ovn.kubernetes.io/logical_router: {}
f:ovn.kubernetes.io/mac_address: {}
f:ovn.kubernetes.io/pod_nic_type: {}
f:ovn.kubernetes.io/routed: {}
- manager: ovn-vpc-nat-gw.test
operation: Update
apiVersion: v1
time: '2024-05-16T14:48:47Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:ovn.kubernetes.io/logical_switch: {}
f:spec:
f:containers:
k:{"name":"container"}:
.: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources: {}
f:securityContext:
.: {}
f:privileged: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:terminationGracePeriodSeconds: {}
- manager: kubelet
operation: Update
apiVersion: v1
time: '2024-05-16T14:48:58Z'
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
k:{"type":"ContainersReady"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Initialized"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"PodReadyToStartContainers"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Ready"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
f:containerStatuses: {}
f:hostIP: {}
f:hostIPs: {}
f:phase: {}
f:podIP: {}
f:podIPs:
.: {}
k:{"ip":"192.168.0.4"}:
.: {}
f:ip: {}
f:startTime: {}
subresource: status
selfLink: /api/v1/namespaces/ovn-vpc-nat-gw-1365/pods/fip-pod-195474116
status:
phase: Running
conditions:
- type: PodReadyToStartContainers
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:58Z'
- type: Initialized
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:47Z'
- type: Ready
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:58Z'
- type: ContainersReady
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:58Z'
- type: PodScheduled
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:47Z'
hostIP: 172.18.0.2
hostIPs:
- ip: 172.18.0.2
podIP: 192.168.0.4
podIPs:
- ip: 192.168.0.4
startTime: '2024-05-16T14:48:47Z'
containerStatuses:
- name: container
state:
running:
startedAt: '2024-05-16T14:48:57Z'
lastState: {}
ready: true
restartCount: 0
image: cr.sihe.cloud/docker.io/nginx:1.25.5
imageID: >-
cr.sihe.cloud/docker.io/nginx@sha256:a484819eb60211f5299034ac80f6a681b06f89e65866ce91f356ed7c72af059c
containerID: >-
containerd://ac5e7651c6cc5fca0e05087521e8a61281efe83e2ff8fd1289ee5a358854b538
started: true
qosClass: BestEffort
spec:
volumes:
- name: kube-api-access-bgq26
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
defaultMode: 420
containers:
- name: container
image: cr.sihe.cloud/docker.io/nginx:1.25.5
resources: {}
volumeMounts:
- name: kube-api-access-bgq26
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: false
restartPolicy: Always
terminationGracePeriodSeconds: 3
dnsPolicy: ClusterFirst
serviceAccountName: default
serviceAccount: default
nodeName: kube-ovn-worker
securityContext: {}
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
priority: 0
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority
Here is one of the client pod yaml
apiVersion: v1
kind: Pod
metadata:
name: no-bfd-kube-ovn-control-plane
namespace: ovn-vpc-nat-gw-1365
uid: a7a11927-5291-42d0-9eba-23a06059ccab
resourceVersion: '1242'
creationTimestamp: '2024-05-16T14:48:34Z'
annotations:
ovn.kubernetes.io/allocated: 'true'
ovn.kubernetes.io/cidr: 192.168.0.0/24
ovn.kubernetes.io/gateway: 192.168.0.1
ovn.kubernetes.io/ip_address: 192.168.0.3
ovn.kubernetes.io/logical_router: no-bfd-vpc-194872958
ovn.kubernetes.io/logical_switch: no-bfd-subnet-108149166
ovn.kubernetes.io/mac_address: 00:00:00:83:2E:1E
ovn.kubernetes.io/pod_nic_type: veth-pair
ovn.kubernetes.io/routed: 'true'
managedFields:
- manager: ovn-vpc-nat-gw.test
operation: Update
apiVersion: v1
time: '2024-05-16T14:48:34Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:ovn.kubernetes.io/logical_switch: {}
f:spec:
f:containers:
k:{"name":"container"}:
.: {}
f:command: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources: {}
f:securityContext:
.: {}
f:privileged: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:nodeName: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:terminationGracePeriodSeconds: {}
- manager: kube-ovn-controller
operation: Update
apiVersion: v1
time: '2024-05-16T14:48:35Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:ovn.kubernetes.io/allocated: {}
f:ovn.kubernetes.io/cidr: {}
f:ovn.kubernetes.io/gateway: {}
f:ovn.kubernetes.io/ip_address: {}
f:ovn.kubernetes.io/logical_router: {}
f:ovn.kubernetes.io/mac_address: {}
f:ovn.kubernetes.io/pod_nic_type: {}
f:ovn.kubernetes.io/routed: {}
- manager: kubelet
operation: Update
apiVersion: v1
time: '2024-05-16T14:48:44Z'
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
.: {}
k:{"type":"ContainersReady"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Initialized"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"PodReadyToStartContainers"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"PodScheduled"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Ready"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
f:containerStatuses: {}
f:hostIP: {}
f:hostIPs: {}
f:phase: {}
f:podIP: {}
f:podIPs:
.: {}
k:{"ip":"192.168.0.3"}:
.: {}
f:ip: {}
f:startTime: {}
subresource: status
selfLink: /api/v1/namespaces/ovn-vpc-nat-gw-1365/pods/no-bfd-kube-ovn-control-plane
status:
phase: Running
conditions:
- type: PodReadyToStartContainers
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:44Z'
- type: Initialized
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:34Z'
- type: Ready
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:44Z'
- type: ContainersReady
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:44Z'
- type: PodScheduled
status: 'True'
lastProbeTime: null
lastTransitionTime: '2024-05-16T14:48:34Z'
hostIP: 172.18.0.3
hostIPs:
- ip: 172.18.0.3
podIP: 192.168.0.3
podIPs:
- ip: 192.168.0.3
startTime: '2024-05-16T14:48:34Z'
containerStatuses:
- name: container
state:
running:
startedAt: '2024-05-16T14:48:44Z'
lastState: {}
ready: true
restartCount: 0
image: sha256:53cf521a90c95f85002469e597c59f26b295189390130b19896ee4976ad55010
imageID: >-
cr.sihe.cloud/docker.io/jonlabelle/network-tools@sha256:2f4cd61ca9ad57626b5576bf8398a09353e97d75e6c98c8b1d735301c600db8f
containerID: >-
containerd://988a20d8bc04c38731d9d4652d8ae7a5518038e1052f045d48e7cf7599d40c72
started: true
qosClass: BestEffort
spec:
volumes:
- name: kube-api-access-8zhrb
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
defaultMode: 420
containers:
- name: container
image: >-
cr.sihe.cloud/docker.io/jonlabelle/network-tools@sha256:2f4cd61ca9ad57626b5576bf8398a09353e97d75e6c98c8b1d735301c600db8f
command:
- sh
- '-c'
- sleep infinity
resources: {}
volumeMounts:
- name: kube-api-access-8zhrb
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: false
restartPolicy: Always
terminationGracePeriodSeconds: 3
dnsPolicy: ClusterFirst
serviceAccountName: default
serviceAccount: default
nodeName: kube-ovn-control-plane
securityContext: {}
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
priority: 0
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority
here is the client pod output
no-bfd-kube-ovn-control-plane:/# curl --max-time 2 -v 172.19.0.6:8080
* Trying 172.19.0.6:8080...
* Connection timed out after 2002 milliseconds
* Closing connection
curl: (28) Connection timed out after 2002 milliseconds
no-bfd-kube-ovn-control-plane:/# traceroute -n 172.19.0.6
traceroute to 172.19.0.6 (172.19.0.6), 30 hops max, 46 byte packets
1 192.168.0.1 0.785 ms 0.463 ms 0.349 ms
2 * * *
3 *^C
the DnatRule yaml
apiVersion: kubeovn.io/v1
kind: OvnDnatRule
metadata:
annotations:
ovn.kubernetes.io/vpc_eip: dnat-eip-140362296
creationTimestamp: '2024-05-16T14:49:01Z'
finalizers:
- kubeovn.io/kube-ovn-controller
generation: 1
labels:
ovn.kubernetes.io/eip_v4_ip: 172.19.0.6
ovn.kubernetes.io/eip_v6_ip: ''
managedFields:
- apiVersion: kubeovn.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:ovn.kubernetes.io/vpc_eip: {}
f:finalizers:
.: {}
v:"kubeovn.io/kube-ovn-controller": {}
f:labels:
.: {}
f:ovn.kubernetes.io/eip_v4_ip: {}
f:ovn.kubernetes.io/eip_v6_ip: {}
manager: kube-ovn-controller
operation: Update
time: '2024-05-16T14:49:01Z'
- apiVersion: kubeovn.io/v1
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:externalPort: {}
f:internalPort: {}
f:ipName: {}
f:protocol: {}
f:ready: {}
f:v4Eip: {}
f:v4Ip: {}
f:vpc: {}
manager: kube-ovn-controller
operation: Update
subresource: status
time: '2024-05-16T14:49:01Z'
- apiVersion: kubeovn.io/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
.: {}
f:externalPort: {}
f:internalPort: {}
f:ipName: {}
f:ipType: {}
f:ovnEip: {}
f:protocol: {}
f:v4Ip: {}
f:v6Ip: {}
f:vpc: {}
manager: ovn-vpc-nat-gw.test
operation: Update
time: '2024-05-16T14:49:01Z'
name: dnat-111456855
resourceVersion: '1308'
uid: ce2ff307-d783-4bb1-99db-3b5252396ce0
selfLink: /apis/kubeovn.io/v1/ovn-dnat-rules/dnat-111456855
status:
externalPort: '8080'
internalPort: '80'
ipName: fip-pod-195474116.ovn-vpc-nat-gw-1365
protocol: tcp
ready: true
v4Eip: 172.19.0.6
v4Ip: 192.168.0.4
vpc: no-bfd-vpc-194872958
spec:
externalPort: '8080'
internalPort: '80'
ipName: fip-pod-195474116.ovn-vpc-nat-gw-1365
ipType: ip
ovnEip: dnat-eip-140362296
protocol: tcp
v4Ip: ''
v6Ip: ''
vpc: ''
the output to curl pod directly instead of curl dnat eip target
no-bfd-kube-ovn-control-plane:/# curl --max-time 2 -v 172.19.0.6:8080
* Trying 172.19.0.6:8080...
* Connection timed out after 2002 milliseconds
* Closing connection
curl: (28) Connection timed out after 2002 milliseconds
no-bfd-kube-ovn-control-plane:/# traceroute -n 172.19.0.6
traceroute to 172.19.0.6 (172.19.0.6), 30 hops max, 46 byte packets
1 192.168.0.1 0.785 ms 0.463 ms 0.349 ms
2 * * *
3 *^C
no-bfd-kube-ovn-control-plane:/# curl -v 192.168.0.4
* Trying 192.168.0.4:80...
* Connected to 192.168.0.4 (192.168.0.4) port 80
> GET / HTTP/1.1
> Host: 192.168.0.4
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.25.5
< Date: Fri, 17 May 2024 02:05:31 GMT
< Content-Type: text/html
< Content-Length: 615
< Last-Modified: Tue, 16 Apr 2024 14:29:59 GMT
< Connection: keep-alive
< ETag: "661e8b67-267"
< Accept-Ranges: bytes
<
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host 192.168.0.4 left intact
it is a bug: can not curl dnat
external IP: external portinside the pod which is in the same vpc subnet
it may be relevant with multi-external
subnets:
external IP: external port
outside the vpc.
(v) root@u24:~/feat/test/kovn/ovn-nat/03-reuse-e2e# kgp
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default netshoot 1/1 Running 0 55s 192.168.0.12 kube-ovn-worker <none> <none>
default nginx 1/1 Running 0 55s 192.168.0.11 kube-ovn-worker <none> <none>
(v) root@u24:~/feat/test/kovn/ovn-nat/03-reuse-e2e# k exec -it -n default netshoot -- bash
netshoot:~# curl 192.168.0.11
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
netshoot:~# curl 172.19.0.9
^C
# it is failed here, but if you use fip, it is works
netshoot:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 eth0
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
netshoot:~#
netshoot:~#
netshoot:~#
exit
# if you access the `external IP: external port` , it is works.
(v) root@u24:~/feat/test/kovn/ovn-nat/03-reuse-e2e# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
97c28c20ac62 kindest/node:v1.30.0 "/usr/local/bin/entr…" 43 minutes ago Up 43 minutes kube-ovn-worker
80b3b7df201c kindest/node:v1.30.0 "/usr/local/bin/entr…" 43 minutes ago Up 43 minutes 127.0.0.1:40447->6443/tcp kube-ovn-control-plane
b40ba84fb865 moby/buildkit:buildx-stable-1 "buildkitd --allow-i…" 2 hours ago Up About an hour buildx_buildkit_lucid_allen0
(v) root@u24:~/feat/test/kovn/ovn-nat/03-reuse-e2e# docker exec -it kube-ovn-worker -- bash
OCI runtime exec failed: exec failed: unable to start container process: exec: "--": executable file not found in $PATH: unknown
(v) root@u24:~/feat/test/kovn/ovn-nat/03-reuse-e2e# docker exec -it kube-ovn-worker bash
root@kube-ovn-worker:/#
root@kube-ovn-worker:/#
root@kube-ovn-worker:/# curl 172.19.0.9
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@kube-ovn-worker:/#
please use the dnat outside the vpc, i think it is a better way.
and also, we will try to fix the bug later.
@zcq98 please keep a watchful eye on the issue.
@bobz965 Thanks for verifiing this bug.
I will try the workaround, but sometimes it's hard to config difference between external and internal dns.
Thanks for your effert, hope you can fix this soon.
Since I try my best to find the reason of the bug. But I have no idea, now. Every ovn-ovs config I can see is correct. So more info about this bug is also helpful. E.g. key command to find the miss-configuration. Because I also found sometimes the fip will not work for cluster pod to access, too. But I didn't reproduce now.
I think you can put internal dns ip before external DNS IP in /etc/resolv.conf.
you can try trace ovs flow. ref: https://github.com/kubeovn/kube-ovn/issues/3329
Thanks for changing /etc/resolv.conf, We are a cloud provider: https://console.sihe.cloud/ , So edit resolv.conf works for our internal. But it not easy to extend to our end user. Because our end-user will also use eip for their business.
for trace ovs flow, that sounds a good point to looking at ovn-ovs.
At the end, wish you can find the reason and fix it.
you can try trace ovs flow with your team. if you need any help, you can post it here. and also, you can seek help from ovn project, which based on your deep understanding of the problem. then you can post the bug detail as a pr to ovn github issue.
Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
Kube-OVN Version
v1.12-mc master
Kubernetes Version
kind
Operation-system/Kernel Version
ubuntu 5.15.0-88-generic
Description
When use ovn eip dnat rule for pod target. The eip can be access outside of the cluster. But it can not be access in k8s cluster pod.
Steps To Reproduce
clone this file: https://github.com/a180285/kube-ovn/blob/ovn-dnat-bug-2/test/e2e/ovn-vpc-nat-gw/e2e_test.go or the branch, then
The main test logic is in below image. Node will success, but cluster pod will failed
Current Behavior
Kind node can access the dnat rule eip. But cluster pod CAN NOT access the same dnat eip.
Expected Behavior
Cluster pod SHOULD be able to access the same dnat eip.