Open weizhoublue opened 1 year ago
When the default cni is cilium, the macvlan IP of the pod cannot be accessed in the host.
/home/ty-test# kubectl exec -ti spiderdoctor-agent-gthhr -n kube-system -- sh
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: net1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 6e:21:3a:70:81:d1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.40.23/16 brd 172.18.255.255 scope global net1
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793:f::97/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::6c21:3aff:fe70:81d1/64 scope link
valid_lft forever preferred_lft forever
26: eth0@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ba:65:7c:7b:dd:f6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.64.78/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fd00:10:244::21/128 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::b865:7cff:fe7b:ddf6/64 scope link
valid_lft forever preferred_lft forever
# exit
root@cyclinder3:/home/ty-test# docker exec -ti 3bec3d6c8287 bash
root@ty-spider-worker:/# ping 172.18.40.23
PING 172.18.40.23 (172.18.40.23) 56(84) bytes of data.
^C
--- 172.18.40.23 ping statistics ---
81 packets transmitted, 0 received, 100% packet loss, time 81898ms
root@ty-spider-worker:/# ping 172.18.40.23 -c 2
PING 172.18.40.23 (172.18.40.23) 56(84) bytes of data.
在实际调试过程中,不论 kdoctor 检测成功或者失败,在主机上访问 pod 的 macvlan IP 都会失败。
node request -> pod responce -> pod lxc ( ebpf drop )
I think this is dropped by cilium
这个优先级不高, overlay 场景下, 主机 并不需要 访问 underlay ip, 主机只会用 cilium pod ip 来做健康检查
这个 用例失败,不是 因为 node要直接访问 pod underlay IP 的 原因 把 ? 是 两个 跨主机的 pod 的 underlay ip 不通 ?
我猜是 macvlan 不能去用 cilium 管理的 eth0,而是要单独 用一张卡,这样,cilium 的 过滤规则就 管控不到了 ( 也许 需要 设置 cilium 的 helm device 选项)
两个 跨主机的 pod 的 underlay ip 可以通?
/home/ty-test/spiderpool# kubectl get po -n kube-system |grep spiderdoctor spiderdoctor-agent-62jjz 1/1 Running 0 5m57s spiderdoctor-agent-6xxdt 1/1 Running 0 5m57s
/home/ty-test/spiderpool# kubectl exec -ti spiderdoctor-agent-62jjz -n kube-system -- ip a show net1 2: net1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether c6:f9:2b:d9:30:3c brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.18.40.121/16 brd 172.18.255.255 scope global net1 valid_lft forever preferred_lft forever inet6 fc00:f853:ccd:e793:f::a2/64 scope global valid_lft forever preferred_lft forever inet6 fe80::c4f9:2bff:fed9:303c/64 scope link valid_lft forever preferred_lft forever
kubectl exec -ti spiderdoctor-agent-6xxdt -n kube-system -- ip a show net1 2: net1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 7a:f5:ed:5b:b1:06 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.18.40.146/16 brd 172.18.255.255 scope global net1 valid_lft forever preferred_lft forever inet6 fc00:f853:ccd:e793:f::57/64 scope global valid_lft forever preferred_lft forever inet6 fe80::78f5:edff:fe5b:b106/64 scope link valid_lft forever preferred_lft forever
/home/ty-test/spiderpool# kubectl exec -ti spiderdoctor-agent-62jjz -n kube-system -- ping 172.18.40.146 -c 2 PING 172.18.40.146 (172.18.40.146) 56(84) bytes of data. 64 bytes from 172.18.40.146: icmp_seq=1 ttl=64 time=0.083 ms 64 bytes from 172.18.40.146: icmp_seq=2 ttl=64 time=0.103 ms
--- 172.18.40.146 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1019ms rtt min/avg/max/mdev = 0.083/0.093/0.103/0.010 ms
spiderdoctor 测试时,关掉这个 taget ,应该就好了?
success: meanAccessDelayInMs: 15000 successRate: 1 target: targetAgent: testClusterIp: true testEndpoint: true testIPv4: true testIPv6: true testIngress: false testLoadBalancer: false testMultusInterface: false testNodePort: true
可能是testNodePort失败导致的,在macvlan-underlay模式下访问NodePort是有已知问题的 https://github.com/spidernet-io/cni-plugins/issues/142
可能是testNodePort失败导致的,在macvlan-underlay模式下访问NodePort是有已知问题的 spidernet-io/cni-plugins#142
这是个 overlay mode 用例,为什么 和 macvlan-underlay模式 有关系
没有有 做根因 分析 ? 不能 失败什么 关什么
在看,v6的clusterip 访问有点问题
这个环境cilium pod访问svc(ep为自己)不通
root@cyclinder3:~/cyclinder# kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-6ccdcc86df-g78dn 1/1 Running 0 3m4s 10.244.65.2 ty-spider-control-plane <none> <none>
test-6ccdcc86df-vvgz6 1/1 Running 0 4m9s 10.244.64.25 ty-spider-worker <none> <none>
test-pod-85c445cb44-9nnv9 1/1 Running 0 4d 172.18.40.38 ty-spider-worker <none> <none>
root@cyclinder3:~/cyclinder# kubectl describe svc test-ipv6
Name: test-ipv6
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=test
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv6
IP: fd00:10:233::e5f
IPs: fd00:10:233::e5f
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 32686/TCP
Endpoints: [fd00:10:244::128]:80,[fd00:10:244::70]:80
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- curl [fd00:10:233::e5f]:80
^Ccommand terminated with exit code 130
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- curl [fd00:10:233::e5f]:80
^Ccommand terminated with exit code 130
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- curl [fd00:10:233::e5f]:80
{"clientIp":"[fd00:10:244::128]:42540","otherDetail":{"/spiderdoctoragent":"route to print request"},"requestHeader":{"Accept":" */* ","User-Agent":" curl/7.81.0 "},"requestUrl":"/","serverName":"test-6ccdcc86df-vvgz6"}
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- curl [fd00:10:233::e5f]:80
^Ccommand terminated with exit code 130
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
43: eth0@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 22:4a:3d:eb:1e:7f brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.65.2/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fd00:10:244::128/128 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::204a:3dff:feeb:1e7f/64 scope link
valid_lft forever preferred_lft forever
root@ty-spider-control-plane:/home/cilium# cilium monitor --from 541
Listening for events on 16 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
-> endpoint 541 flow 0xb522b447 , identity 13262->13262 state new ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0x8f414a02 , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0xcf3d762b , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0xcf988979 , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0x8aa61064 , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
-> endpoint 541 flow 0xa34f043f , identity 13262->13262 state established ifindex lxc3c38e607befa orig-ip fd00:10:244::128: [fd00:10:244::128]:51788 -> [fd00:10:244::128]:80 tcp SYN
root@cyclinder3:~/cyclinder# kubectl exec -it test-6ccdcc86df-g78dn bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@test-6ccdcc86df-g78dn:/# curl [fd00:10:233::e5f]:80
curl: (28) Failed to connect to fd00:10:233::e5f port 80 after 129673 ms: Connection timed out
root@test-6ccdcc86df-g78dn:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
43: eth0@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 22:4a:3d:eb:1e:7f brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.65.2/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fd00:10:244::128/128 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::204a:3dff:feeb:1e7f/64 scope link
valid_lft forever preferred_lft forever
This may be a cilium bug, and until we confirm this issue, I recommend setting testIPv6
to false.
This may be a cilium bug, and until we confirm this issue, I recommend setting
testIPv6
to false.
from julianwiedmann: that sounds like a feature (pod looping back to itself through a Service) that is currently only supported for IPv4.
success:
meanAccessDelayInMs: 15000
successRate: 1
target:
targetAgent:
testClusterIp: true
testEndpoint: true
testIPv4: true
testIPv6: false
testIngress: false
testLoadBalancer: false
testMultusInterface: true
testNodePort: true
testMultusInterface
is works, so we should set it to true, only disable testIPv6
action url: https://github.com/spidernet-io/spiderpool/actions/runs/5472017414