Closed day0ops closed 3 years ago
Ive also checked to see if it's possible to configure an advertised address for api server but theres no reference in the cluster spec about it.
@kasunt-nixdev Try adding, should help:
spec
kubeAPIServer:
bindAddress: "::"
subnets:
- type: Public
ipv6CIDR: /64#1
@hakman Thanks but it still appears to be the same problem. So AWS doesnt have probs with the subnets AFAICT. It generates both a IPv4 and IPv6.
However proxy is still failing due to
I0612 21:10:32.121941 1 flags.go:59] FLAG: --add-dir-header="false"
I0612 21:10:32.122659 1 flags.go:59] FLAG: --alsologtostderr="true"
I0612 21:10:32.122674 1 flags.go:59] FLAG: --bind-address="::"
I0612 21:10:32.122683 1 flags.go:59] FLAG: --bind-address-hard-fail="false"
I0612 21:10:32.122727 1 flags.go:59] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
I0612 21:10:32.122733 1 flags.go:59] FLAG: --cleanup="false"
I0612 21:10:32.122744 1 flags.go:59] FLAG: --cluster-cidr="fd12:3456:789a:1::/64"
I0612 21:10:32.122752 1 flags.go:59] FLAG: --config=""
I0612 21:10:32.122757 1 flags.go:59] FLAG: --config-sync-period="15m0s"
I0612 21:10:32.122799 1 flags.go:59] FLAG: --conntrack-max-per-core="131072"
I0612 21:10:32.122806 1 flags.go:59] FLAG: --conntrack-min="131072"
I0612 21:10:32.122811 1 flags.go:59] FLAG: --conntrack-tcp-timeout-close-wait="1h0m0s"
I0612 21:10:32.122817 1 flags.go:59] FLAG: --conntrack-tcp-timeout-established="24h0m0s"
I0612 21:10:32.122826 1 flags.go:59] FLAG: --detect-local-mode=""
I0612 21:10:32.122833 1 flags.go:59] FLAG: --feature-gates="IPv6DualStack=false"
I0612 21:10:32.122869 1 flags.go:59] FLAG: --healthz-bind-address="0.0.0.0:10256"
I0612 21:10:32.122878 1 flags.go:59] FLAG: --healthz-port="10256"
I0612 21:10:32.122886 1 flags.go:59] FLAG: --help="false"
I0612 21:10:32.122892 1 flags.go:59] FLAG: --hostname-override="ip-172-20-55-23.ap-southeast-2.compute.internal"
I0612 21:10:32.122904 1 flags.go:59] FLAG: --iptables-masquerade-bit="14"
I0612 21:10:32.122909 1 flags.go:59] FLAG: --iptables-min-sync-period="1s"
I0612 21:10:32.122939 1 flags.go:59] FLAG: --iptables-sync-period="30s"
I0612 21:10:32.122947 1 flags.go:59] FLAG: --ipvs-exclude-cidrs="[]"
I0612 21:10:32.122956 1 flags.go:59] FLAG: --ipvs-min-sync-period="0s"
I0612 21:10:32.122961 1 flags.go:59] FLAG: --ipvs-scheduler=""
I0612 21:10:32.122966 1 flags.go:59] FLAG: --ipvs-strict-arp="false"
I0612 21:10:32.122975 1 flags.go:59] FLAG: --ipvs-sync-period="30s"
I0612 21:10:32.122980 1 flags.go:59] FLAG: --ipvs-tcp-timeout="0s"
I0612 21:10:32.122994 1 flags.go:59] FLAG: --ipvs-tcpfin-timeout="0s"
I0612 21:10:32.123013 1 flags.go:59] FLAG: --ipvs-udp-timeout="0s"
I0612 21:10:32.123020 1 flags.go:59] FLAG: --kube-api-burst="10"
I0612 21:10:32.123025 1 flags.go:59] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0612 21:10:32.123034 1 flags.go:59] FLAG: --kube-api-qps="5"
I0612 21:10:32.123044 1 flags.go:59] FLAG: --kubeconfig="/var/lib/kube-proxy/kubeconfig"
I0612 21:10:32.123050 1 flags.go:59] FLAG: --log-backtrace-at=":0"
I0612 21:10:32.123081 1 flags.go:59] FLAG: --log-dir=""
I0612 21:10:32.123088 1 flags.go:59] FLAG: --log-file="/var/log/kube-proxy.log"
I0612 21:10:32.123093 1 flags.go:59] FLAG: --log-file-max-size="1800"
I0612 21:10:32.123098 1 flags.go:59] FLAG: --log-flush-frequency="5s"
I0612 21:10:32.123103 1 flags.go:59] FLAG: --logtostderr="false"
I0612 21:10:32.123112 1 flags.go:59] FLAG: --machine-id-file="/etc/machine-id,/var/lib/dbus/machine-id"
I0612 21:10:32.123119 1 flags.go:59] FLAG: --masquerade-all="false"
I0612 21:10:32.123123 1 flags.go:59] FLAG: --master="https://127.0.0.1"
I0612 21:10:32.123151 1 flags.go:59] FLAG: --metrics-bind-address="127.0.0.1:10249"
I0612 21:10:32.123160 1 flags.go:59] FLAG: --metrics-port="10249"
I0612 21:10:32.123166 1 flags.go:59] FLAG: --nodeport-addresses="[]"
I0612 21:10:32.123173 1 flags.go:59] FLAG: --one-output="false"
I0612 21:10:32.123183 1 flags.go:59] FLAG: --oom-score-adj="-998"
I0612 21:10:32.123188 1 flags.go:59] FLAG: --profiling="false"
I0612 21:10:32.123193 1 flags.go:59] FLAG: --proxy-mode=""
I0612 21:10:32.123230 1 flags.go:59] FLAG: --proxy-port-range=""
I0612 21:10:32.123238 1 flags.go:59] FLAG: --show-hidden-metrics-for-version=""
I0612 21:10:32.123242 1 flags.go:59] FLAG: --skip-headers="false"
I0612 21:10:32.123313 1 flags.go:59] FLAG: --skip-log-headers="false"
I0612 21:10:32.123324 1 flags.go:59] FLAG: --stderrthreshold="2"
I0612 21:10:32.123329 1 flags.go:59] FLAG: --udp-timeout="250ms"
I0612 21:10:32.123335 1 flags.go:59] FLAG: --v="2"
I0612 21:10:32.123365 1 flags.go:59] FLAG: --version="false"
I0612 21:10:32.123382 1 flags.go:59] FLAG: --vmodule=""
I0612 21:10:32.123390 1 flags.go:59] FLAG: --write-config-to=""
W0612 21:10:32.123397 1 server.go:220] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
I0612 21:10:32.123572 1 feature_gate.go:243] feature gates: &{map[IPv6DualStack:false]}
I0612 21:10:32.123838 1 feature_gate.go:243] feature gates: &{map[IPv6DualStack:false]}
I0612 21:10:32.158616 1 node.go:172] Successfully retrieved node IP: 172.20.55.23
I0612 21:10:32.158653 1 server_others.go:140] Detected node IP 172.20.55.23
W0612 21:10:32.158679 1 server_others.go:598] Unknown proxy mode "", assuming iptables proxy
I0612 21:10:32.158840 1 server_others.go:177] DetectLocalMode: 'ClusterCIDR'
I0612 21:10:32.177998 1 server_others.go:208] kube-proxy running in single-stack IPv4 mode
I0612 21:10:32.178089 1 server_others.go:212] Using iptables Proxier.
F0612 21:10:32.178129 1 server.go:489] unable to create proxier: CIDR fd12:3456:789a:1::/64 has incorrect IP version: expect isIPv6=false
Looks like it works without feature flag IPv6DualStack: "false"
I0612 23:02:27.983244 1 server_others.go:206] kube-proxy running in dual-stack mode, IPv4-primary
I0612 23:02:27.983275 1 server_others.go:212] Using iptables Proxier.
I0612 23:02:27.983308 1 server_others.go:219] creating dualStackProxier for iptables.
W0612 23:02:27.983321 1 server_others.go:503] detect-local-mode set to ClusterCIDR, but no IPv4 cluster CIDR defined, defaulting to no-op detect-local for IPv4
I0612 23:02:27.983464 1 utils.go:375] Changed sysctl "net/ipv4/conf/all/route_localnet": 0 -> 1
I0612 23:02:27.983561 1 proxier.go:282] "using iptables mark for masquerade" ipFamily=IPv4 mark="0x00004000"
I0612 23:02:27.983640 1 proxier.go:330] "iptables sync params" ipFamily=IPv4 minSyncPeriod="1s" syncPeriod="30s" burstSyncs=2
I0612 23:02:27.983702 1 proxier.go:340] "iptables supports --random-fully" ipFamily=IPv4
I0612 23:02:27.983809 1 proxier.go:282] "using iptables mark for masquerade" ipFamily=IPv6 mark="0x00004000"
I0612 23:02:27.984007 1 proxier.go:330] "iptables sync params" ipFamily=IPv6 minSyncPeriod="1s" syncPeriod="30s" burstSyncs=2
I0612 23:02:27.984169 1 proxier.go:340] "iptables supports --random-fully" ipFamily=IPv6
I0612 23:02:27.984483 1 server.go:643] Version: v1.21.1
That only means compatibility right now for a control plane that isnt IPv6 only. Am i right in assuming this ? In order for it to be IPv6 only I guess the api server will have to advertise as a IPv6 address ?
Dual-stack is enable only when you use --cluster-cidr="<IPv4CIDR>,<IPv6CIDR>"
. So, your cluster should be IPv6 only.
The feature gate is just there to enable some code paths. Most likely the feature gate will be removed in k8s 1.23.
May be Ive misunderstood this statement in the 1.21 docs then. "Starting in 1.21, IPv4/IPv6 dual-stack defaults to enabled" Ref https://kubernetes.io/docs/concepts/services-networking/dual-stack. So I thought it was needed to be disabled explicitly.
And also the fact that the proxy was failing with the error,
I0612 21:10:32.158840 1 server_others.go:177] DetectLocalMode: 'ClusterCIDR'
I0612 21:10:32.177998 1 server_others.go:208] kube-proxy running in single-stack IPv4 mode
I0612 21:10:32.178089 1 server_others.go:212] Using iptables Proxier.
F0612 21:10:32.178129 1 server.go:489] unable to create proxier: CIDR fd12:3456:789a:1::/64 has incorrect IP version: expect isIPv6=false
Feature gates are a way of introducing new features in a way that makes it easy to disable or remove later if they don't get to be finalized. https://kubernetes.io/blog/2020/08/21/moving-forward-from-beta/ The feature itself has its own enable / disable method and that is described here: https://kubernetes.io/docs/concepts/services-networking/dual-stack/#configure-ipv4-ipv6-dual-stack
In this case specifically, I don't know why you see that error when you set the feature flag to "false". IPv6 only mode should work without it, but I never tested that.
I hope it works as expected now.
@hakman Appreciate your time with this. And yes it is working as expected now.
👍 If you don't mind, can you share a little about the use case? Thanks!
We are trying to build an IPv6 only cluster to test the compatibility with Istio (both as single cluster and multi cluster topologies).
Are you planing on adding any other components to the cluster for NAT64/DNS64, or that is not a concern at the moment?
No this isn't a concern as of yet. I know I closed this issue but there is another minor issue im running into with this setup which is related to the webhooks.
Looks like api server isnt able to access any webhooks due to,
W0613 23:30:11.389242 1 dispatcher.go:182] Failed calling webhook, failing closed webhook.cert-manager.io: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded
If you have any ideas please let me know.
I wonder if this is due to the fact that control plane is both IPv4 and IPv6 enabled and worker nodes are IPv6 enabled in CNI ?
Maybe try describing the cert-manager-webhook
service and its endpoints? Any IPv4 or no endpoints at all?
Here it is.
Name: cert-manager-webhook
Namespace: cert-manager
Labels: app=webhook
app.kubernetes.io/component=webhook
app.kubernetes.io/instance=cert-manager
app.kubernetes.io/name=webhook
Annotations: <none>
Selector: app.kubernetes.io/component=webhook,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=webhook
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv6
IP: fd12:3456:789a::1838
IPs: fd12:3456:789a::1838
Port: https 443/TCP
TargetPort: 10888/TCP
Endpoints: [fd12:3456:789a:1:f451:ce23:2e7e:1397]:10888
Session Affinity: None
Events: <none>
Strangely dig AAAA cert-manager-webhook.cert-manager.svc +tcp
results in NXDOMAIN
with CoreDNS
[INFO] [fd12:3456:789a:1:f6b:daa7:766f:8583]:50637 - 56850 "AAAA IN cert-manager-webhook.cert-manager.svc. tcp 78 false 65535" NXDOMAIN qr,rd,ra,ad 130 0.004804612s
But dig AAAA cert-manager-webhook.cert-manager.svc.cluster.local +tcp
returns [INFO] [fd12:3456:789a:1:f6b:daa7:766f:8583]:39779 - 5311 "AAAA IN cert-manager-webhook.cert-manager.svc.cluster.local. tcp 92 false 65535" NOERROR qr,aa,rd 148 0.000188346s
I couldnt see why the short alias would fail with CoreDNS even when I have included autopath @kubernetes
You can also add log
to the Corefile and see all requests:
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#are-dns-queries-being-received-processed
I see that cert-manager-webhook.cert-manager.svc.
contains a .
at the end which suggests it looks for an absolute domain name.
@hakman. Thanks. Yup been scratching my head on this for a while. Found a more serious issue than the one above and I think there might be a bigger network/CNI issue going on.
Ive got a 1 master 2 worker node topology as in the OP above. But only one of the CoreDNS instances respond in the worker InstanceGroup. Any idea why this might be ?
dig AAAA cert-manager-webhook.cert-manager.svc.cluster.local +tcp @fd12:3456:789a:1:f451:ce23:2e7e:1398
;; Connection to fd12:3456:789a:1:f451:ce23:2e7e:1398#53(fd12:3456:789a:1:f451:ce23:2e7e:1398) for cert-manager-webhook.cert-manager.svc.cluster.local failed: timed out.
;; Connection to fd12:3456:789a:1:f451:ce23:2e7e:1398#53(fd12:3456:789a:1:f451:ce23:2e7e:1398) for cert-manager-webhook.cert-manager.svc.cluster.local failed: timed out.
; <<>> DiG 9.11.6-P1 <<>> AAAA cert-manager-webhook.cert-manager.svc.cluster.local +tcp @fd12:3456:789a:1:f451:ce23:2e7e:1398
;; global options: +cmd
;; connection timed out; no servers could be reached
;; Connection to fd12:3456:789a:1:f451:ce23:2e7e:1398#53(fd12:3456:789a:1:f451:ce23:2e7e:1398) for cert-manager-webhook.cert-manager.svc.cluster.local failed: timed out.
command terminated with exit code 9
dig AAAA cert-manager-webhook.cert-manager.svc.cluster.local +tcp @fd12:3456:789a:1:f6b:daa7:766f:8596
; <<>> DiG 9.11.6-P1 <<>> AAAA cert-manager-webhook.cert-manager.svc.cluster.local +tcp @fd12:3456:789a:1:f6b:daa7:766f:8596
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33016
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 39bf8f22503235fe (echoed)
;; QUESTION SECTION:
;cert-manager-webhook.cert-manager.svc.cluster.local. IN AAAA
;; ANSWER SECTION:
cert-manager-webhook.cert-manager.svc.cluster.local. 30 IN AAAA fd12:3456:789a::1838
;; Query time: 2 msec
;; SERVER: fd12:3456:789a:1:f6b:daa7:766f:8596#53(fd12:3456:789a:1:f6b:daa7:766f:8596)
;; WHEN: Mon Jun 14 07:48:47 UTC 2021
;; MSG SIZE rcvd: 171
But no issues as far as the nodes joining the cluster go
ip-172-20-33-186.ap-southeast-2.compute.internal Ready node 21h v1.21.1 Ubuntu 20.04.2 LTS 5.4.0-1045-aws containerd://1.4.6
ip-172-20-35-233.ap-southeast-2.compute.internal Ready control-plane,master 21h v1.21.1 Ubuntu 20.04.2 LTS 5.4.0-1045-aws containerd://1.4.6
ip-172-20-46-213.ap-southeast-2.compute.internal Ready node 21h v1.21.1 Ubuntu 20.04.2 LTS 5.4.0-1045-aws containerd://1.4.6
All I can say is that I ran the k8s conformance tests with 5 nodes and worked with few minor unrelated issues. I would suggest to delete the cluster and try a fresh one, see if you can reproduce. If you can, I can give it a try too in the next few days.
Cool thanks @hakman.
One thought, I believe this is also needed:
networking:
calico:
awsSrcDstCheck: Disable
Ha thats exactly what i ended up doing and yes it worked
😄
@hackman that only fixes one of the problems still though. Theres still a pending issue with the api server denying requests to any webhook endpoints. My assumption here is that control plane / master is on IPv4. Is there any way to force kubelet
and the rest of the components to IPv6 ?
Ive tried using hostNetwork: true
but then running into this issue instead which i guess something to do with the CIDR pool
❯ k get svc -n cert-manager
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cert-manager ClusterIP fd12:3456:789a::add <none> 9402/TCP 9h
cert-manager-webhook ClusterIP fd12:3456:789a::d95d <none> 443/TCP 9h
❯ k get ep -n cert-manager
NAME ENDPOINTS AGE
cert-manager [fd12:3456:789a:1:1e64:a71b:2a18:28ce]:9402 9h
cert-manager-webhook <none> 9h
@hakman I think we can close this. But I had to go down the route of using kubeadm to provision the cluster instead.
Sorry to hear that, but thanks for the feedback. Maybe you can try again when kOps 1.22 is "more ready".
/kind bug
1. What
kops
version are you running? The commandkops version
, will display this information. 1.22.0-alpha.1 (compiled on master branch)2. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag. 1.21.13. What cloud provider are you using? AWS
4. What commands did you run? What is the simplest way to reproduce this issue? kops create -f kops.yaml where kops.yaml is given below (in 7).
5. What happened after the commands executed? Lots of crashloopbacks
6. What did you expect to happen? For all the workloads to get IPv6 addresses without any issues
**7. Please provide your cluster manifest.
8. More information. kube-proxy is failing due to,
Calico workloads failing due to,
Nodes are all live,