Closed MTRNord closed 1 month ago
seems like after the 4th full delete and recreate it works again. I wonder if this is related to #15806
seems like after the 4th full delete and recreate it works again. I wonder if this is related to #15806
Only one way to find out. Please check the kops-configuration
logs on failed nodes.
Also, I did not test the --zones=hel1,fsn
part so not sure if it works with 2 regions.
I will have a look when it fails again. Took me some time to realise the user to connect via is not ubuntu
but root
on the hetzner instances.
Also, I did not test the --zones=hel1,fsn part so not sure if it works with 2 regions.
for the nodes it fails with an hard error, but for control plane it works just fine it seems. No errors or issues as far i was able to tell so far. All servers spawn and kubernetes says everything is happy. I had so far no workload on the cluster though. So it might have bugs I didnt see yet. But I am doubtful that there are any.
This time its a control-plane node. It seems to fail on this:
Apr 27 21:29:16 control-plane-fsn1-5c11fa08140f0e98 nodeup[1091]: I0427 21:29:16.539628 1091 files.go:136] Hash did not match for "/var/cache/nodeup/sha256:525e2b62ba92a1b6f3dc9612449a84aa61652e680f7ebf4eff579795fe464b57_cni-plugins-linux-arm64-v1_2_0_tgz": actual=sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 vs expected=sha256:525e2b62ba92a1b6f3dc9612449a84aa61652e680f7ebf4eff579795fe464b57
Apr 27 21:29:16 control-plane-fsn1-5c11fa08140f0e98 nodeup[1091]: I0427 21:29:16.539684 1091 http.go:82] Downloading "https://storage.googleapis.com/k8s-artifacts-cni/release/v1.2.0/cni-plugins-linux-arm64-v1.2.0.tgz"
Apr 27 21:29:16 control-plane-fsn1-5c11fa08140f0e98 nodeup[1091]: W0427 21:29:16.747301 1091 assetstore.go:251] error downloading url "https://storage.googleapis.com/k8s-artifacts-cni/release/v1.2.0/cni-plugins-linux-arm64-v1.2.0.tgz": error response from "https://storage.googleapis.com/k8s-artifacts-cni/release/v1.2.0/cni-plugins-linux-arm64-v1.2.0.tgz": HTTP 403
Apr 27 21:29:16 control-plane-fsn1-5c11fa08140f0e98 nodeup[1091]: W0427 21:29:16.747362 1091 main.go:133] got error running nodeup (will retry in 30s): error adding asset "525e2b62ba92a1b6f3dc9612449a84aa61652e680f7ebf4eff579795fe464b57@https://storage.googleapis.com/k8s-artifacts-cni/release/v1.2.0/cni-plugins-linux-arm64-v1.2.0.tgz": error response from "https://storage.googleapis.com/k8s-artifacts-cni/release/v1.2.0/cni-plugins-linux-arm64-v1.2.0.tgz": HTTP 403
as the server responds with <?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>We're sorry, but this service is not available in your location</Details></Error>
which means this is the same bug as #15806 as hetzner has IPs which maxmind sadly recognises as Iran despite not being there. (I dealt with this before with docker and it was a huge hassle to get them to update the IP. Took me multiple explenations to get that fixed.).
TLDR: As a workaround deleting the server and updating to reinit the rest might be easiest here.
TLDR: As a workaround deleting the server and updating to reinit the rest might be easiest here.
That is pretty much the path of least resistance. You may also want to take a look at another issue for some suggestions https://github.com/kubernetes/kops/issues/16466#issuecomment-2063553896.
Be sure and get in touch with Hetzner via support ticket if you get bit by a blocked IP. Best odds we have of them no longer being blackholed by Google is if Hetzner reaches out to them to see what the deal is.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/kind bug
1. What
kops
version are you running? The commandkops version
, will display this information.Client version: 1.29.0-beta.1 (git-v1.29.0-beta.1-154-g87a0483ca3)
2. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag.3. What cloud provider are you using? Hetzner
4. What commands did you run? What is the simplest way to reproduce this issue?
kops create cluster --name=cluster-example.k8s.local --ssh-public-key=~/.ssh/id_ed25519.pub --cloud=hetzner --zones=hel1 --networking=cilium --network-cidr=10.10.0.0/16 --node-count=2 --control-plane-count=3 --control-plane-zones=hel1,fsn1 --node-size=cax21 --control-plane-size cax11
5. What happened after the commands executed? All nodes and resources are created however validate fails. The one node only joined after 3 recreations. The other one doesnt join at all:
6. What did you expect to happen?
All nodes join
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.9. Anything else do we need to know?
Additionally the ssh key seems to not get applied. trying to ssh in only yields a user password request. the SSH key doesnt get accepted.
This was tried well beyond the 10m mark.