k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.91k stars 2.33k forks source link

v1.26.3 server startup failure: "IPv6 was enabled but no IPv6 address was found on node" #7190

Closed aserraric closed 1 year ago

aserraric commented 1 year ago

Environmental Info: K3s Version: k3s version v1.26.3+k3s1 (01ea3ff2) go version go1.19.7

Node(s) CPU architecture, OS, and Version: Linux waveland 6.1.16-2-MANJARO-ARM-RPI #1 SMP PREEMPT Sat Mar 11 21:46:57 UTC 2023 aarch64 GNU/Linux

Cluster Configuration: single node cluster

Describe the bug: K3s fails to start in dual stack mode with message "IPv6 was enabled but no IPv6 address was found on node" even though the node has an IPv6 address.

Steps To Reproduce: Error occurred first after upgrading (in the stable channel) from 1.25 to 1.26. Downgrade back to 1.25 exhibits the same error, however. config.yaml:

write-kubeconfig-mode: 644
node-ip: 192.168.178.100,fd00::5038:7f13:6fea:cb62
cluster-cidr: 10.42.0.0/16,2001:cafe:42:0::/56 
service-cidr: 10.43.0.0/16,2001:cafe:42:1::/112
flannel-ipv6-masq: true

Expected behavior: K3s starts in dual stack mode

Actual behavior: k3s does not start

Additional context / logs:

INFO[0000] Starting k3s v1.26.3+k3s1 (01ea3ff2)         
INFO[0000] Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s 
INFO[0000] Configuring database table schema and indexes, this may take a moment... 
INFO[0000] Database tables and indexes are up to date   
INFO[0000] Kine available at unix://kine.sock           
INFO[0000] Reconciling bootstrap data between datastore and disk 
INFO[0000] Tunnel server egress proxy mode: agent       
INFO[0000] Tunnel server egress proxy waiting for runtime core to become available 
INFO[0000] Running kube-apiserver --advertise-address=192.168.178.100 --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,k3s --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --egress-selector-config-file=/var/lib/rancher/k3s/server/etc/egress-selector-config.yaml --enable-admission-plugins=NodeRestriction --enable-aggregator-routing=true --enable-bootstrap-token-auth=true --etcd-servers=unix://kine.sock --feature-gates=JobTrackingWithFinalizers=true --kubelet-certificate-authority=/var/lib/rancher/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --profiling=false --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16,2001:cafe:42:1::/112 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-private-key-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key 
INFO[0000] Running kube-scheduler --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --profiling=false --secure-port=10259 
INFO[0000] Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16,2001:cafe:42::/56 --cluster-signing-kube-apiserver-client-cert-file=/var/lib/rancher/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kube-apiserver-client-key-file=/var/lib/rancher/k3s/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/var/lib/rancher/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kubelet-client-key-file=/var/lib/rancher/k3s/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/var/lib/rancher/k3s/server/tls/server-ca.nochain.crt --cluster-signing-kubelet-serving-key-file=/var/lib/rancher/k3s/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/var/lib/rancher/k3s/server/tls/server-ca.nochain.crt --cluster-signing-legacy-unknown-key-file=/var/lib/rancher/k3s/server/tls/server-ca.key --configure-cloud-routes=false --controllers=*,tokencleaner,-service,-route,-cloud-node-lifecycle --feature-gates=JobTrackingWithFinalizers=true --kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --leader-elect=false --profiling=false --root-ca-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/var/lib/rancher/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16,2001:cafe:42:1::/112 --use-service-account-credentials=true 
W0331 10:24:23.645951    5123 feature_gate.go:241] Setting GA feature gate JobTrackingWithFinalizers=true. It will be removed in a future release.
I0331 10:24:23.647176    5123 server.go:569] external host was not specified, using 192.168.178.100
I0331 10:24:23.650687    5123 server.go:171] Version: v1.26.3+k3s1
I0331 10:24:23.650802    5123 server.go:173] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
INFO[0000] Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --bind-address=127.0.0.1 --cloud-config=/var/lib/rancher/k3s/server/etc/cloud-config.yaml --cloud-provider=k3s --cluster-cidr=10.42.0.0/16,2001:cafe:42::/56 --configure-cloud-routes=false --controllers=*,-route --kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --leader-elect=false --leader-elect-resource-name=k3s-cloud-controller-manager --node-status-update-frequency=1m0s --profiling=false 
INFO[0000] Waiting for API server to become available   
INFO[0000] Server node token is available at /var/lib/rancher/k3s/server/token 
INFO[0000] To join server node to cluster: k3s server -s https://192.168.178.100:6443 -t ${SERVER_NODE_TOKEN} 
INFO[0000] Agent node token is available at /var/lib/rancher/k3s/server/agent-token 
INFO[0000] To join agent node to cluster: k3s agent -s https://192.168.178.100:6443 -t ${AGENT_NODE_TOKEN} 
INFO[0000] Wrote kubeconfig /etc/rancher/k3s/k3s.yaml   
INFO[0000] Run: k3s kubectl                             
I0331 10:24:23.756589    5123 shared_informer.go:273] Waiting for caches to sync for node_authorizer
I0331 10:24:23.760577    5123 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0331 10:24:23.761120    5123 plugins.go:161] Loaded 12 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionPolicy,ValidatingAdmissionWebhook,ResourceQuota.
W0331 10:24:23.886811    5123 genericapiserver.go:660] Skipping API apiextensions.k8s.io/v1beta1 because it has no resources.
I0331 10:24:23.890847    5123 instance.go:277] Using reconciler: lease
INFO[0000] certificate CN=waveland signed by CN=k3s-server-ca@1665236055: notBefore=2022-10-08 13:34:15 +0000 UTC notAfter=2024-03-30 08:24:23 +0000 UTC 
INFO[0000] certificate CN=system:node:waveland,O=system:nodes signed by CN=k3s-client-ca@1665236055: notBefore=2022-10-08 13:34:15 +0000 UTC notAfter=2024-03-30 08:24:24 +0000 UTC 
INFO[0000] Module overlay was already loaded            
INFO[0000] Module nf_conntrack was already loaded       
INFO[0000] Module br_netfilter was already loaded       
INFO[0000] Module iptable_nat was already loaded        
INFO[0000] Module iptable_filter was already loaded     
INFO[0000] Module ip6table_nat was already loaded       
INFO[0000] Module ip6table_filter was already loaded    
W0331 10:24:24.264613    5123 sysinfo.go:203] Nodes topology is not available, providing CPU topology
INFO[0000] Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log 
INFO[0000] Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd 
I0331 10:24:24.626680    5123 instance.go:621] API group "internal.apiserver.k8s.io" is not enabled, skipping.
I0331 10:24:25.164912    5123 instance.go:621] API group "resource.k8s.io" is not enabled, skipping.
INFO[0001] Waiting for containerd startup: rpc error: code = Unknown desc = server is not initialized yet 
I0331 10:24:25.373173    5123 trace.go:219] Trace[781936107]: "List(recursive=true) etcd3" audit-id:,key:/secrets,resourceVersion:,resourceVersionMatch:,limit:10000,continue: (31-Mar-2023 10:24:23.943) (total time: 1429ms):
Trace[781936107]: [1.429169265s] [1.429169265s] END
W0331 10:24:25.562861    5123 genericapiserver.go:660] Skipping API authentication.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.562937    5123 genericapiserver.go:660] Skipping API authentication.k8s.io/v1alpha1 because it has no resources.
W0331 10:24:25.568470    5123 genericapiserver.go:660] Skipping API authorization.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.581919    5123 genericapiserver.go:660] Skipping API autoscaling/v2beta1 because it has no resources.
W0331 10:24:25.581993    5123 genericapiserver.go:660] Skipping API autoscaling/v2beta2 because it has no resources.
W0331 10:24:25.592019    5123 genericapiserver.go:660] Skipping API batch/v1beta1 because it has no resources.
W0331 10:24:25.599561    5123 genericapiserver.go:660] Skipping API certificates.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.605688    5123 genericapiserver.go:660] Skipping API coordination.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.605881    5123 genericapiserver.go:660] Skipping API discovery.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.621350    5123 genericapiserver.go:660] Skipping API networking.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.621421    5123 genericapiserver.go:660] Skipping API networking.k8s.io/v1alpha1 because it has no resources.
W0331 10:24:25.626837    5123 genericapiserver.go:660] Skipping API node.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.626908    5123 genericapiserver.go:660] Skipping API node.k8s.io/v1alpha1 because it has no resources.
W0331 10:24:25.627054    5123 genericapiserver.go:660] Skipping API policy/v1beta1 because it has no resources.
W0331 10:24:25.642676    5123 genericapiserver.go:660] Skipping API rbac.authorization.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.642751    5123 genericapiserver.go:660] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
W0331 10:24:25.648594    5123 genericapiserver.go:660] Skipping API scheduling.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.648658    5123 genericapiserver.go:660] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
W0331 10:24:25.664691    5123 genericapiserver.go:660] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
W0331 10:24:25.680277    5123 genericapiserver.go:660] Skipping API flowcontrol.apiserver.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.680348    5123 genericapiserver.go:660] Skipping API flowcontrol.apiserver.k8s.io/v1alpha1 because it has no resources.
W0331 10:24:25.697750    5123 genericapiserver.go:660] Skipping API apps/v1beta2 because it has no resources.
W0331 10:24:25.697818    5123 genericapiserver.go:660] Skipping API apps/v1beta1 because it has no resources.
W0331 10:24:25.706938    5123 genericapiserver.go:660] Skipping API admissionregistration.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.707009    5123 genericapiserver.go:660] Skipping API admissionregistration.k8s.io/v1alpha1 because it has no resources.
W0331 10:24:25.716197    5123 genericapiserver.go:660] Skipping API events.k8s.io/v1beta1 because it has no resources.
W0331 10:24:25.775344    5123 genericapiserver.go:660] Skipping API apiregistration.k8s.io/v1beta1 because it has no resources.
INFO[0002] containerd is now running                    
INFO[0003] Connecting to proxy                           url="wss://127.0.0.1:6443/v1-k3s/connect"
INFO[0003] Running kubelet --address=0.0.0.0 --allowed-unsafe-sysctls=net.ipv4.ip_forward,net.ipv6.conf.all.forwarding --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=waveland --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --kubelet-cgroups=/k3s --node-labels= --pod-infra-container-image=rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key 
INFO[0003] Handling backend connection request [waveland] 
INFO[0003] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error 
I0331 10:24:28.099957    5123 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt"
I0331 10:24:28.099943    5123 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt"
I0331 10:24:28.100626    5123 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt::/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key"
I0331 10:24:28.101217    5123 secure_serving.go:210] Serving securely on 127.0.0.1:6444
I0331 10:24:28.101335    5123 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0331 10:24:28.101976    5123 gc_controller.go:78] Starting apiserver lease garbage collector
I0331 10:24:28.102405    5123 customresource_discovery_controller.go:288] Starting DiscoveryController
I0331 10:24:28.102515    5123 dynamic_serving_content.go:132] "Starting controller" name="aggregator-proxy-cert::/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt::/var/lib/rancher/k3s/server/tls/client-auth-proxy.key"
I0331 10:24:28.103307    5123 controller.go:121] Starting legacy_token_tracking_controller
I0331 10:24:28.103384    5123 shared_informer.go:273] Waiting for caches to sync for configmaps
I0331 10:24:28.103576    5123 autoregister_controller.go:141] Starting autoregister controller
I0331 10:24:28.103618    5123 cache.go:32] Waiting for caches to sync for autoregister controller
I0331 10:24:28.103686    5123 controller.go:80] Starting OpenAPI V3 AggregationController
I0331 10:24:28.104612    5123 cluster_authentication_trust_controller.go:440] Starting cluster_authentication_trust_controller controller
I0331 10:24:28.104674    5123 shared_informer.go:273] Waiting for caches to sync for cluster_authentication_trust_controller
I0331 10:24:28.104753    5123 available_controller.go:494] Starting AvailableConditionController
I0331 10:24:28.104786    5123 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
I0331 10:24:28.104864    5123 apf_controller.go:361] Starting API Priority and Fairness config controller
I0331 10:24:28.105167    5123 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt"
I0331 10:24:28.105375    5123 apiservice_controller.go:97] Starting APIServiceRegistrationController
I0331 10:24:28.105412    5123 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
I0331 10:24:28.105463    5123 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt"
I0331 10:24:28.105485    5123 controller.go:83] Starting OpenAPI AggregationController
I0331 10:24:28.104797    5123 crdregistration_controller.go:111] Starting crd-autoregister controller
I0331 10:24:28.106431    5123 shared_informer.go:273] Waiting for caches to sync for crd-autoregister
I0331 10:24:28.106484    5123 controller.go:85] Starting OpenAPI V3 controller
I0331 10:24:28.106570    5123 naming_controller.go:291] Starting NamingConditionController
I0331 10:24:28.106654    5123 establishing_controller.go:76] Starting EstablishingController
I0331 10:24:28.106722    5123 nonstructuralschema_controller.go:192] Starting NonStructuralSchemaConditionController
I0331 10:24:28.106781    5123 apiapproval_controller.go:186] Starting KubernetesAPIApprovalPolicyConformantConditionController
I0331 10:24:28.106441    5123 controller.go:85] Starting OpenAPI controller
I0331 10:24:28.106835    5123 crd_finalizer.go:266] Starting CRDFinalizer
I0331 10:24:28.255213    5123 controller.go:132] OpenAPI AggregationController: action for item k8s_internal_local_delegation_chain_0000000000: Nothing (removed from the queue).
I0331 10:24:28.303956    5123 cache.go:39] Caches are synced for autoregister controller
I0331 10:24:28.305291    5123 shared_informer.go:280] Caches are synced for configmaps
I0331 10:24:28.308003    5123 controller.go:615] quota admission added evaluator for: leases.coordination.k8s.io
I0331 10:24:28.308867    5123 shared_informer.go:280] Caches are synced for cluster_authentication_trust_controller
I0331 10:24:28.308994    5123 cache.go:39] Caches are synced for AvailableConditionController controller
I0331 10:24:28.311320    5123 apf_controller.go:366] Running API Priority and Fairness config worker
I0331 10:24:28.311446    5123 apf_controller.go:369] Running API Priority and Fairness periodic rebalancing process
I0331 10:24:28.312570    5123 shared_informer.go:280] Caches are synced for crd-autoregister
I0331 10:24:28.312659    5123 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I0331 10:24:28.358005    5123 shared_informer.go:280] Caches are synced for node_authorizer
E0331 10:24:28.437287    5123 available_controller.go:527] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.42.0.240:10250/apis/metrics.k8s.io/v1beta1: Get "https://10.42.0.240:10250/apis/metrics.k8s.io/v1beta1": proxy error from 127.0.0.1:6443 while dialing 10.42.0.240:10250, code 503: 503 Service Unavailable
E0331 10:24:28.641519    5123 available_controller.go:527] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.42.0.240:10250/apis/metrics.k8s.io/v1beta1: Get "https://10.42.0.240:10250/apis/metrics.k8s.io/v1beta1": proxy error from 127.0.0.1:6443 while dialing 10.42.0.240:10250, code 503: 503 Service Unavailable
INFO[0005] Tunnel server egress proxy waiting for runtime core to become available 
E0331 10:24:28.692411    5123 controller.go:163] Error removing old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
I0331 10:24:29.133206    5123 storage_scheduling.go:111] all system priority classes are created successfully or already exist.
W0331 10:24:29.439784    5123 handler_proxy.go:106] no RequestInfo found in the context
E0331 10:24:29.440352    5123 controller.go:113] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: Error, could not get list of group versions for APIService
I0331 10:24:29.440656    5123 controller.go:126] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W0331 10:24:29.458149    5123 handler_proxy.go:106] no RequestInfo found in the context
E0331 10:24:29.458282    5123 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0331 10:24:29.458328    5123 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
INFO[0007] Waiting for cloud-controller-manager privileges to become available 
W0331 10:24:31.264435    5123 feature_gate.go:241] Setting GA feature gate JobTrackingWithFinalizers=true. It will be removed in a future release.
INFO[0007] Kube API server is now running               
INFO[0007] ETCD server is now running                   
INFO[0007] k3s is up and running                        
Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
Flag --pod-infra-container-image has been deprecated, will be removed in 1.27. Image garbage collector will get sandbox image information from CRI.
I0331 10:24:31.444828    5123 server.go:197] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
I0331 10:24:31.450004    5123 server.go:407] "Kubelet version" kubeletVersion="v1.26.3+k3s1"
I0331 10:24:31.450108    5123 server.go:409] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
W0331 10:24:31.461393    5123 sysinfo.go:203] Nodes topology is not available, providing CPU topology
I0331 10:24:31.465477    5123 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt"
W0331 10:24:31.474798    5123 machine.go:65] Cannot read vendor id correctly, set empty.
I0331 10:24:31.479715    5123 server.go:654] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
I0331 10:24:31.480765    5123 container_manager_linux.go:267] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
I0331 10:24:31.481111    5123 container_manager_linux.go:272] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName:/k3s KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] ExperimentalTopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]}
I0331 10:24:31.481270    5123 topology_manager.go:134] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
I0331 10:24:31.481338    5123 container_manager_linux.go:308] "Creating device plugin manager"
I0331 10:24:31.481500    5123 state_mem.go:36] "Initialized new in-memory state store"
INFO[0008] Annotations and labels have already set on node: waveland 
INFO[0008] Starting flannel with backend vxlan          
I0331 10:24:31.540653    5123 kubelet.go:398] "Attempting to sync node with API server"
I0331 10:24:31.541075    5123 kubelet.go:286] "Adding static pod path" path="/var/lib/rancher/k3s/agent/pod-manifests"
I0331 10:24:31.541379    5123 kubelet.go:297] "Adding apiserver pod source"
I0331 10:24:31.541688    5123 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
INFO[0008] Stopped tunnel to 127.0.0.1:6443             
INFO[0008] Connecting to proxy                           url="wss://192.168.178.100:6443/v1-k3s/connect"
INFO[0008] Proxy done                                    err="context canceled" url="wss://127.0.0.1:6443/v1-k3s/connect"
I0331 10:24:31.561134    5123 kuberuntime_manager.go:244] "Container runtime initialized" containerRuntime="containerd" version="v1.6.19-k3s1" apiVersion="v1"
I0331 10:24:31.562618    5123 server.go:1181] "Started kubelet"
I0331 10:24:31.568514    5123 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
I0331 10:24:31.578000    5123 server.go:161] "Starting to listen" address="0.0.0.0" port=10250
I0331 10:24:31.581274    5123 server.go:451] "Adding debug handlers to kubelet server"
INFO[0008] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF 
I0331 10:24:31.596363    5123 volume_manager.go:293] "Starting Kubelet Volume Manager"
I0331 10:24:31.597342    5123 desired_state_of_world_populator.go:151] "Desired state populator starts to run"
E0331 10:24:31.599120    5123 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs"
E0331 10:24:31.599228    5123 kubelet.go:1386] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
INFO[0008] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6443/v1-k3s/readyz: 500 Internal Server Error 
INFO[0008] Applying CRD addons.k3s.cattle.io            
INFO[0008] Flannel found PodCIDR assigned for node waveland 
INFO[0008] The interface end0 with ipv4 address 192.168.178.100 will be used by flannel 
INFO[0008] Using dual-stack mode. The ipv6 address fd00::5038:7f13:6fea:cb62 will be used by flannel 
I0331 10:24:31.642793    5123 kube.go:144] Waiting 10m0s for node controller to sync
I0331 10:24:31.645393    5123 kube.go:485] Starting kube subnet manager
INFO[0008] Handling backend connection request [waveland] 
I0331 10:24:31.678340    5123 kube.go:506] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.0.0/24 2001:cafe:42::/64]
I0331 10:24:31.678779    5123 kube.go:539] Creating the node lease for IPv6. This is the n.Spec.PodCIDRs: [10.42.0.0/24 2001:cafe:42::/64]
I0331 10:24:31.680263    5123 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv4
E0331 10:24:31.720604    5123 container_manager_linux.go:945] "Unable to get rootfs data from cAdvisor interface" err="unable to find data in memory cache"
I0331 10:24:31.726115    5123 kubelet_node_status.go:70] "Attempting to register node" node="waveland"
I0331 10:24:31.856755    5123 cpu_manager.go:214] "Starting CPU manager" policy="none"
I0331 10:24:31.856837    5123 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
I0331 10:24:31.857043    5123 state_mem.go:36] "Initialized new in-memory state store"
I0331 10:24:31.857920    5123 state_mem.go:88] "Updated default CPUSet" cpuSet=""
I0331 10:24:31.858425    5123 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
I0331 10:24:31.858775    5123 policy_none.go:49] "None policy: Start"
I0331 10:24:31.879862    5123 memory_manager.go:169] "Starting memorymanager" policy="None"
I0331 10:24:31.880842    5123 state_mem.go:35] "Initializing new in-memory state store"
I0331 10:24:31.881877    5123 state_mem.go:75] "Updated machine memory state"
I0331 10:24:31.884241    5123 manager.go:455] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
I0331 10:24:31.884955    5123 plugin_manager.go:118] "Starting Kubelet Plugin Manager"
I0331 10:24:31.919655    5123 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv6
I0331 10:24:31.919723    5123 status_manager.go:176] "Starting to sync pod status with apiserver"
I0331 10:24:31.919779    5123 kubelet.go:2113] "Starting kubelet main sync loop"
E0331 10:24:31.919930    5123 kubelet.go:2137] "Skipping pod synchronization" err="PLEG is not healthy: pleg has yet to be successful"
I0331 10:24:31.930585    5123 kubelet_node_status.go:108] "Node was previously registered" node="waveland"
I0331 10:24:31.930889    5123 kubelet_node_status.go:73] "Successfully registered node" node="waveland"
FATA[0008] IPv6 was enabled but no IPv6 address was found on node 
brandond commented 1 year ago

K3s fails to start in dual stack mode with message "IPv6 was enabled but no IPv6 address was found on node" even though the node has an IPv6 address.

Logs are great but can you show the ifconfig or ip addr output that shows the actual ipv4 and ipv6 address on the node? Without knowing what addresses the node actually does have and on which interfaces it's hard to disagree with the application is reporting.

aserraric commented 1 year ago

Sorry, forgot about that. ip addr:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: end0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether e4:5f:01:29:43:bb brd ff:ff:ff:ff:ff:ff
    inet 192.168.178.100/24 brd 192.168.178.255 scope global dynamic noprefixroute end0
       valid_lft 848267sec preferred_lft 740267sec
    inet6 fd00::5038:7f13:6fea:cb62/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 6796sec preferred_lft 3196sec
    inet6 2003:f3:170d:8a00:c9ba:8cb4:eae1:4d91/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 6796sec preferred_lft 919sec
    inet6 fe80::fcdd:1f00:9f6f:40be/64 scope link 
       valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether e4:5f:01:29:43:bc brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:20:a3:f8:c1 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
brandond commented 1 year ago

How are you configuring ipv6 on your network? I suspect that all of the candidate ipv6 addresses are being eliminated due to their being flagged as dynamic: dynamic mngtmpaddr noprefixroute

You can also see that they have a fairly short lifetime.

aserraric commented 1 year ago

All the addresses are handled by my router. That is not something that has changed recently, though.

Can I (should I) just use the link local address (fe80...)?

brandond commented 1 year ago

Oh, I just noticed that the address you've specified in your node-ip field is a Unique Local Address. Everything under FD00::/8 is ULA and may not be valid for what you're trying to to with it.

node-ip: 192.168.178.100,fd00::5038:7f13:6fea:cb62

Try using the 2003:f3:170d:8a00:c9ba:8cb4:eae1:4d91 address instead. That's within the space that would be assigned by an ISP.

aserraric commented 1 year ago

Hmm, wouldn't that be subject to change at the ISPs discretion? K3s has worked with the fd00 address for more than a year previously.

I tried it anyway, but got the same error. Flannel still tries to use the ULA, even though I specified the 2003... in config.yaml:

INFO[0008] Using dual-stack mode. The ipv6 address fd00::5038:7f13:6fea:cb62 will be used by flannel

brandond commented 1 year ago

It looks like the error in question is coming from the network policy controller, not flannel. For some reason it's blowing up because the IPv6 address hasn't been set on the Kubernetes node object. That may be because the address you were trying to use was invalid, or it may be that it just hasn't been set yet.

You might try starting k3s with disable-network-policy: true, and then do kubectl get node. See if your node has an IPv6 address listed? Once it does, then you should be able to re-enable the network policy controller.

I'm a little confused why your node wouldn't have an IPv6 address set if you'd been running with dual-stack enabled for a while, though...

aserraric commented 1 year ago

Okay, disabling network policy brought the node up, with either the fd00 or the 2003 address assigned (depending on how I configured it).

brandond commented 1 year ago

what is the output of kubectl get node -o yaml ? Is there an IPv6 address listed for the node? If there is, are you able to restart with the NPC enabled?

aserraric commented 1 year ago

According to the annotations, it has an Ipv6 address, according to the status, it does not:

apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 192.168.178.100
    flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"16:0b:d6:fa:ee:7f"}'
    flannel.alpha.coreos.com/backend-type: vxlan
    flannel.alpha.coreos.com/backend-v6-data: '{"VNI":1,"VtepMAC":"56:c9:de:97:04:fd"}'
    flannel.alpha.coreos.com/kube-subnet-manager: "true"
    flannel.alpha.coreos.com/public-ip: 192.168.178.100
    flannel.alpha.coreos.com/public-ipv6: fd00::5038:7f13:6fea:cb62
    k3s.io/hostname: waveland
    k3s.io/internal-ip: 192.168.178.100,fd00::5038:7f13:6fea:cb62
    k3s.io/node-args: '["server","--write-kubeconfig-mode","644","--node-ip","192.168.178.100,fd00::5038:7f13:6fea:cb62","--cluster-cidr","10.42.0.0/16,2001:cafe:42:0::/56","--service-cidr","10.43.0.0/16,2001:cafe:42:1::/112","--flannel-ipv6-masq","true","--disable-network-policy","false"]'
    k3s.io/node-config-hash: KPKNV6ISOCF5Y5URRUWO6C5IKSXNMN3O3IIJ2RGSKWBLTLFV5M5A====
    k3s.io/node-env: '{"K3S_CONFIG_FILE":"/etc/rancher/k3s/config.yaml","K3S_DATA_DIR":"/var/lib/rancher/k3s/data/5646fe4613ed1cc8277ceffa5aed8260c68ff8a219648503438eddc2017a1962"}'
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2022-10-08T13:34:29Z"
  finalizers:
  - wrangler.cattle.io/node
  labels:
    beta.kubernetes.io/arch: arm64
    beta.kubernetes.io/instance-type: k3s
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: arm64
    kubernetes.io/hostname: waveland
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: "true"
    node-role.kubernetes.io/master: "true"
    node.kubernetes.io/instance-type: k3s
  name: waveland
  resourceVersion: "3046721"
  uid: 01c1b42a-9be3-4493-b35c-63ff3d2495d5
spec:
  podCIDR: 10.42.0.0/24
  podCIDRs:
  - 10.42.0.0/24
  - 2001:cafe:42::/64
  providerID: k3s://waveland
status:
  addresses:
  - address: 192.168.178.100
    type: InternalIP
  - address: waveland
    type: Hostname

But now it starts up even with disable-network-policy: false

I am very confused.

aserraric commented 1 year ago

It definitely doesn't have an IPv6 address. The ingress is only reachable on IPv4, and I know for a fact that it used to be reachable on IPv6 as well (on the fd00... address).

brandond commented 1 year ago

Yeah, I'm confused as to why it's not adding the ipv6 address to the node status.

Do you know what version of k3s you were on previously, where it appeared to be working?

aserraric commented 1 year ago

It was the previous release in the stable channel. I want to say 1.25.6

aserraric commented 1 year ago

Today, I restarted the machine the node is on, and k3s again wouldn't start up. So I disabled network policy again in the config, and lo and behold, the node started with a IPv6 address.

status:
  addresses:
  - address: 192.168.178.100
    type: InternalIP
  - address: fd00::5038:7f13:6fea:cb62
    type: InternalIP
  - address: waveland
    type: Hostname

I'm not sure what that means, but I guess I will leave network policy disabled for the time being. However, I am not at all sure if it will work the next time the machine needs a reboot.

brandond commented 1 year ago

Ahh, I bet that the IPv6 address isn't assigned immediately. You're getting an IPv4 address from the DHCP server immediately, and then K3s gets started, but the IPv6 address is assigned later, so the IPv6 address you requested isn't present initially and the network policy controller gets confused.

@rbrtbnfgl @thomasferrandiz the code in question here seems to have been added or refactored during some of the kube-router 2.0.0 release prep, I don't see this fatal error happening on earlier releases. Is this something we could work with upstream to address?

rbrtbnfgl commented 1 year ago

I'll take a look at it. However it seems right to give an error if the IP is not configured yet. EDIT: On the previous version there wasn't any check on the assigned Internal/External IP of the node https://github.com/k3s-io/kube-router/blob/v1.5.1%2Bk3s/pkg/controllers/netpol/network_policy_controller.go#L753 With the latest version the kube-router checks it. The issue is related that somehow the IPv6 as InternalIP on the node is not assigned. I tested using different IP configured as node-ip both IPv4 and IPv6 (addresses that wasn't configured on the node) and the InternalIPs were always assigned. The reason why the IPv6 is not configured is somehow related to this setup.

brandond commented 1 year ago

@aserraric I suspect that you're going to need to figure out how to solve this on your side. I'm not sure if you can ask systemd to wait on the network configuration step until both address families are assigned, instead of just proceeding on to start services as soon as you have an IPv4 address, or what. But the root cause seems to be that the network policy controller now correctly checks to see that your node has an IPv6 address assigned when using dual-stack, and your node frequently does not when starting K3s during system boot.

aserraric commented 1 year ago

I'm not convinced that this is the reason. When I did my first round of testing, the system had been booted up for several hours and (as far as I can tell) had an IPv6 address assigned the entire time. Also, systemd is configured to keep trying to start k3s, so if your theory was true, it should eventually work.

I rather suspect that there may be a race condition between kubelet starting the node and assigning the IPs and the network policy controller checking for the IPs. Unfortunately, the network policy controller causes a bailout of the entire node when it doesn't find an IPv6 address, so I can't tell if the node had one or not since the apiserver is already gone. I'll try to do some more testing.

brandond commented 1 year ago

I could see a race condition on initial startup, but the kubelet should be atomic in updating the addresses when it comes up again after a reboot. It shouldn't remove only one of the two address families, especially if they've both been manually specified on the command line.

@rbrtbnfgl I do think we should raise this with upstream. This is the second bit of odd behavior I've seen since on the v2.0 branch.

rbrtbnfgl commented 1 year ago

But the issue is not kube-router related. I think is more related to the InternalIP not configured on the node.

rbrtbnfgl commented 1 year ago

Can you try to add this additional config kubelet-arg: "--node-ip=0.0.0.0"

aserraric commented 1 year ago

I set my config.yaml to this:

write-kubeconfig-mode: 644
node-ip: 192.168.178.100,fd00::5038:7f13:6fea:cb62
cluster-cidr: 10.42.0.0/16,2001:cafe:42:0::/56
service-cidr: 10.43.0.0/16,2001:cafe:42:1::/112
flannel-ipv6-masq: true
disable-network-policy: false
kubelet-arg: "--node-ip=0.0.0.0"

and rebooted the machine. This led to an immediate startup with both IPv4 and IPv6 addresses assigned:

apiVersion: v1
items:
- apiVersion: v1
  kind: Node
  metadata:
    annotations:
      flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"9a:ed:30:f1:3f:f4"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/backend-v6-data: '{"VNI":1,"VtepMAC":"46:fc:fd:b2:2d:47"}'
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 192.168.178.100
      flannel.alpha.coreos.com/public-ipv6: fd00::5038:7f13:6fea:cb62
      k3s.io/hostname: waveland
      k3s.io/internal-ip: 192.168.178.100,fd00::5038:7f13:6fea:cb62
      k3s.io/node-args: '["server","--write-kubeconfig-mode","644","--node-ip","192.168.178.100,fd00::5038:7f13:6fea:cb62","--cluster-cidr","10.42.0.0/16,2001:cafe:42:0::/56","--service-cidr","10.43.0.0/16,2001:cafe:42:1::/112","--flannel-ipv6-masq","true","--disable-network-policy","false","--kubelet-arg","--node-ip=0.0.0.0"]'
      k3s.io/node-config-hash: XNRDVN26TUE6TR6WQKXYSZTRTSKDEIXRGVIN2ESZ6TG547QVDYRA====
      k3s.io/node-env: '{"K3S_CONFIG_FILE":"/etc/rancher/k3s/config.yaml","K3S_DATA_DIR":"/var/lib/rancher/k3s/data/5646fe4613ed1cc8277ceffa5aed8260c68ff8a219648503438eddc2017a1962"}'
      node.alpha.kubernetes.io/ttl: "0"
      volumes.kubernetes.io/controller-managed-attach-detach: "true"
    creationTimestamp: "2022-10-08T13:34:29Z"
    finalizers:
    - wrangler.cattle.io/node
    labels:
      beta.kubernetes.io/arch: arm64
      beta.kubernetes.io/instance-type: k3s
      beta.kubernetes.io/os: linux
      kubernetes.io/arch: arm64
      kubernetes.io/hostname: waveland
      kubernetes.io/os: linux
      node-role.kubernetes.io/control-plane: "true"
      node-role.kubernetes.io/master: "true"
      node.kubernetes.io/instance-type: k3s
    name: waveland
    resourceVersion: "3170436"
    uid: 01c1b42a-9be3-4493-b35c-63ff3d2495d5
  spec:
    podCIDR: 10.42.0.0/24
    podCIDRs:
    - 10.42.0.0/24
    - 2001:cafe:42::/64
    providerID: k3s://waveland
  status:
    addresses:
    - address: 192.168.178.100
      type: InternalIP
    - address: fd00::5038:7f13:6fea:cb62
      type: InternalIP
    - address: waveland
      type: Hostname
brandond commented 1 year ago

We already cover this for 1.24 and 1.25 in the docs, maybe we should suggest it for all versions. https://docs.k3s.io/installation/network-options#dual-stack-ipv4--ipv6-networking

aserraric commented 1 year ago

I originally set up this cluster with 1.24, and never had this issue until I updated to 1.26. Does that mean 1.26 has changed the notion of what is considered the "primary network interface"? The device is question is a Raspberry Pi 4B, and end0 is the only ethernet interface on there. wifi is disabled.

rbrtbnfgl commented 1 year ago

I read that it was updated something on 1.26 and it will be fixed on 1.27 for sure.