Closed gianarb closed 3 years ago
This is a good point. We need better logic for it. "403 not authorized" is not the same as "404 does not exist". In a delete, if we cannot find the device, then the state is how we want it to be and successful. In a create, if 404, we want to keep trying until we find it.
But a 403 needs to generate a different kind of error. What is the usual k8s controller behaviour here?
As I wrote, I think the status code returned by the API makes things harder. When I open the packet.net UI I do not see that server. So for me, it should be 404.
Do we know why an unprovisioned cluster returns 403 and it is not just treated as a cluster not found?
What is the query that we made that returns 403?
GET device by ID
That is strange. You can replicate it regularly? 403 is an auth issue, not a device-not-found issue. Unless it returns 403 if a device isn't found, because maybe it exists but you don't have rights to see it?
Can you try with direct curl?
No I do not know how to reproduce a provisioning failure in our cloud
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
This issue still persists. Metal produces a GRUB error during provisioning server and the server disappears from Metal dashboard. But GET device by ID returns 403 for that machine. That's because cluster-api-provider-packet-controller-manager stucks at Reconciling state and can't continue to Reconciling.
kubectl logs -f cluster-api-provider-packet-controller-manager
2021-05-01T15:04:16.705Z INFO controllers.PacketMachine.infrastructure.cluster.x-k8s.io/v1alpha3 Reconciling PacketMachine {"packetmachine": "default/capi-quickstart-control-plane-ppmbg", "machine": "capi-quickstart-control-plane-jlt5k", "cluster": "capi-quickstart", "packetcluster": "capi-quickstart"} 2021-05-01T15:04:16.942Z ERROR controller-runtime.controller Reconciler error {"controller": "packetmachine", "name": "capi-quickstart-control-plane-ppmbg", "namespace": "default", "error": "GET https://api.equinix.com/metal/v1/devices/XXXXXXXXX?include=facility: 403 You are not authorized to view this device "} github.com/go-logr/zapr.(zapLogger).Error /go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.14/pkg/internal/controller/controller.go:257 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.14/pkg/internal/controller/controller.go:231 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).worker /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.14/pkg/internal/controller/controller.go:210 k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1 /go/pkg/mod/k8s.io/apimachinery@v0.17.12/pkg/util/wait/wait.go:152 k8s.io/apimachinery/pkg/util/wait.JitterUntil /go/pkg/mod/k8s.io/apimachinery@v0.17.12/pkg/util/wait/wait.go:153 k8s.io/apimachinery/pkg/util/wait.Until /go/pkg/mod/k8s.io/apimachinery@v0.17.12/pkg/util/wait/wait.go:88
kubectl get machine
NAME PROVIDERID PHASE VERSION capi-quickstart-control-plane-jlt5k Provisioning v1.18.16 capi-quickstart-worker-a-7766f9f9b9-d98xm Pending v1.18.16 capi-quickstart-worker-a-7766f9f9b9-gsp87 Pending v1.18.16 capi-quickstart-worker-a-7766f9f9b9-l2z4m Pending v1.18.16
@tahaozket: You can't reopen an issue/PR unless you authored it or you are a collaborator.
The PacketAPI returns:
Right now the
MachineController
tries over and over without an end. it does not sound great. Should we treat the error as we do when the controller receives a 404? We assume that the server is not running anymore and we make the reconciliation as a success. It is not ideal because 403 will may be fixed by generating a new API key. I think the API is not returning the right status code here.@deitch