Closed displague closed 3 months ago
More specifically, we believe the issue is with docker desktop on MacOS.
On macOS, using Docker Desktop, the machine controller fails to bring up devices in Equinix Metal. The logs are filled with messages like this:
E0309 17:22:22.216338 1 controller.go:317] controller/packetmachine "msg"="Reconciler error" "error"="failed to create scope: failed to get workload cluster client: failed to create client for Cluster default/my-cluster: Get \"https://139.178.81.91:6443/api?timeout=10s\": dial tcp 139.178.81.91:6443: connect: connection refused" "name"="my-cluster-control-plane-m29n2" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="PacketMachine"
The error message above comes from here: https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/main/controllers/packetmachine_controller.go#L130
Further investigation indicates that, for some reason, connections to any closed port from a container running in Docker Desktop on macOS fail with Connection refused
instead of a connection timeout. Docker Desktop on macOS uses a VM to host the containers, and something about the configuration of that VM appears to cause containers to see closed ports on external services as open but non-responsive:
# From a terminal on the macOS host
$ curl google.com:43421
curl: (28) Failed to connect to google.com port 43421 after 75006 ms: Couldn't connect to server
# From an instance of mikefarah/yq:4.31.1 running in docker on the same macOS host
$ wget google.com:43421
Connecting to google.com:43421 (172.217.1.110:43421)
wget: can't connect to remote host (172.217.1.110): Connection refused
This behavior does not happen with Colima on macOS (which also runs containers in a host VM), so this issue appears to be specific to Docker Desktop on macOS.
We could potentially fix this by treating a Connection refused
error the same way we treat a timeout in this code: https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/365fddba549cb5fa5b32ec469e5cfbb4d3481114/pkg/cloud/packet/scope/machine.go#L342-L355
However, that assumes that Connection refused
is always a startup problem, and maybe we can't rely on that to be the case?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/reopen
/remove-lifecycle rotten
@cprivitere: Reopened this issue.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
User Story
As a [developer/user/operator] I would like to [high level description] for [reasons]
Detailed Description
[A clear and concise description of what you want to happen.]
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
/kind feature