Closed domgreen closed 4 years ago
I am able to reproduce this. Not sure if this is related to the issue, but there are some warnings in events.
kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
4m28s Warning FailedToCreateEndpoint endpoints/agones-allocator Failed to create endpoint for service agones-system/agones-allocator: endpoints "agones-allocator" is forbidden: unable to create new content in namespace agones-system because it is being terminated
4m50s Warning FailedToCreateEndpoint endpoints/agones-controller-service Failed to create endpoint for service agones-system/agones-controller-service: endpoints "agones-controller-service" is forbidden:
unable to create new content in namespace agones-system because it is being terminated
4m29s Warning FailedToCreateEndpoint endpoints/agones-ping-http-service Failed to create endpoint for service agones-system/agones-ping-http-service: endpoints "agones-ping-http-service" is forbidden: unable to create new content in namespace agones-system because it is being terminated
4m29s Warning FailedToCreateEndpoint endpoints/agones-ping-udp-service Failed to create endpoint for service agones-system/agones-ping-udp-service: endpoints "agones-ping-udp-service" is forbidden: unable to create new content in namespace agones-system because it is being terminated
This might help in understanding better the situation and Kubernetes 1.16 (I did a test with 1.15 GKE cluster initially) would give more details in kubectl get ns agones-system
I expect.
https://github.com/kubernetes/kubernetes/issues/70916
I installed agones with Terraform Helm module, latest master, GKE 1.16.13-gke.1 and received a different kubectl get ns
output:
k get ns agones-system -o yaml
apiVersion: v1
kind: Namespace
metadata:
creationTimestamp: "2020-09-01T16:06:03Z"
deletionTimestamp: "2020-09-01T16:11:29Z"
labels:
name: agones-system
name: agones-system
resourceVersion: "3057"
selfLink: /api/v1/namespaces/agones-system
uid: 4b3d77b9-8765-40f6-a472-2b74a46e84fe
spec:
finalizers:
- kubernetes
status:
conditions:
- lastTransitionTime: "2020-09-01T16:11:41Z"
message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
complete list of server APIs: allocation.agones.dev/v1: the server is currently
unable to handle the request'
reason: DiscoveryFailed
status: "True"
type: NamespaceDeletionDiscoveryFailure
- lastTransitionTime: "2020-09-01T16:11:35Z"
message: All legacy kube types successfully parsed
reason: ParsedGroupVersions
status: "False"
type: NamespaceDeletionGroupVersionParsingFailure
- lastTransitionTime: "2020-09-01T16:12:05Z"
message: 'Failed to delete all resource types, 1 remaining: unexpected items still
remain in namespace: agones-system for gvr: /v1, Resource=pods'
reason: ContentDeletionFailed
status: "True"
type: NamespaceDeletionContentFailure
phase: Terminating
Couple of questions:
Couple of questions:
- Which namespaces are you creating Agones and the GameServer in?
agones-system
default
- Do you delete the GameServer before deleting Agones?
Nope, was basically trashing the cluster so wasnt being very gentle :worried:
Hmnn. Interesting.
Usually when I've run into this, it's because of a Finaliser issue - but we only set a Finaliser on the GameServer - which is not in the agones-system namespace. :thinking:
Well, this bug about deleting Agones controller in unusual way, which is not documented on agones.dev: by simply removing agones-system
namespace. You could use kubectl delete -f install.yaml
before removing the namespace and it would work.
I think the finalizer in the agones-system namespace is doing the right thing.
You need to uninstall agones before deleting the namespace, because there are CRDs installed with webhooks referencing the namespace where the agones controller is running.
You need to uninstall agones before deleting the namespace, because there are CRDs installed with webhooks referencing the namespace where the agones controller is running.
Oooooh! That would make sense actually.
Yep, makes alot of sense. Worth adding something to docs or FAQ?
Will see if I can find a way around it for my use case (terraform destroy).
We don't have a section about Agones uninstall in Install with YAML
section. Which is a difference to Install using Helm
.
https://agones.dev/site/docs/installation/install-agones/yaml/
We don't have a section about Agones uninstall in
Install with YAML
section. Which is a difference toInstall using Helm
. https://agones.dev/site/docs/installation/install-agones/yaml/
^ That definitely seems like a good addition!
Well, I will create a PR soon, simple changing agones-system
to agones-system2
(1.9.0-dev
to 1.8.0
) in install.yaml
was enough to create Agones controller in a new namespace. (Only thing is certificate is valid for agones-controller-service.agones-system.svc, not agones-controller-service.agones-system2.svc
) after this changes kubectl apply -f ./install.yaml
and kubectl delete -f ./install.yaml
stuck on
validatingwebhookconfiguration.admissionregistration.k8s.io "agones-validation-webhook" deleted
However kubectl delete ns agones-system2
did not timeout and was successful.
kubectl get ns agones-system2 -o yaml
apiVersion: v1
kind: Namespace
metadata:
creationTimestamp: "2020-09-01T19:53:25Z"
deletionTimestamp: "2020-09-01T19:56:25Z"
name: agones-system2
resourceVersion: "64933"
selfLink: /api/v1/namespaces/agones-system2
uid: ...
spec:
finalizers:
- kubernetes
status:
conditions:
- lastTransitionTime: "2020-09-01T19:56:31Z"
message: All resources successfully discovered
reason: ResourcesDiscovered
status: "False"
type: NamespaceDeletionDiscoveryFailure
- lastTransitionTime: "2020-09-01T19:56:31Z"
message: All legacy kube types successfully parsed
reason: ParsedGroupVersions
status: "False"
type: NamespaceDeletionGroupVersionParsingFailure
- lastTransitionTime: "2020-09-01T19:56:31Z"
message: All content successfully deleted
reason: ContentDeleted
status: "False"
type: NamespaceDeletionContentFailure
phase: Terminating
kubectl get ns agones-system2 -o yaml
Error from server (NotFound): namespaces "agones-system2" not found
What happened:
When deleting the
agones-system
namespace it got stuck in theTerminating
state.What you expected to happen:
It manages to successfully terminate the namespace without manual intervention.
How to reproduce it (as minimally and precisely as possible):
Not 100% sure what if any special things happened in the cluster to make it get stuck in terminating but in general:
Anything else we need to know?: Some commands I used to get it to delete:
Finally followed this guide to help remove the namespace https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.1/troubleshoot/ns_terminating.html
Environment:
1.7.0
kubectl version
):1.16.12-gke.3
gke
yaml