integr8ly / application-monitoring-operator

Operator for installing the Application Monitoring Stack on OpenShift (Prometheus, AlertManager, Grafana)
Apache License 2.0
30 stars 44 forks source link

Deleting the application-monitoring project is stuck in terminating state #74

Closed mkralik3 closed 4 years ago

mkralik3 commented 5 years ago

Sometimes, deleting oc delete project application-monitoring application-monitoring project is stuck in terminating state. For example, now the project is the third day in this state. It happens to me on 3.11 (minishift) and 4.1 (aws).

For reproducing, when I delete project after creating, it works. However, when I want to delete project after some time, e.g. after one hour during which I use it for monitoring my application (Syndesis instance), the deleting operation is stuck. After that, I cannot create new application-monitoring project so I have to reset the whole OPC.

When I look what is still in the project, oc api-resources | tail -n +1 | grep true | awk '{print $1}' | xargs -L 1 -I % bash -c "echo %; oc get %" I see these two remaining resources which are still there:

...
applicationmonitorings
NAME                            AGE
example-applicationmonitoring   5d20h
...
grafanadatasources
NAME         AGE
prometheus   5d20h
...

When I want to delete manually this resources, oc delete applicationmonitorings example-applicationmonitoring oc delete grafanadatasources prometheus the output from oc is: applicationmonitoring.applicationmonitoring.integreatly.org "example-applicationmonitoring" deleted grafanadatasource.integreatly.org "prometheus" deleted however, it is stuck too and I have to terminate (CTRL+C) those commands.

For the investigation, I can provide a credential for our OCP 3.11 instance where the project is stuck just now.

abkieling commented 5 years ago

Additionally, the first time the application-monitoring-operator is installed on Minishift, only the application-monitoring-operator and the prometheus-operator pods are created. I need to delete the project and install again to successfully install all components.

mkralik3 commented 5 years ago

^Agree, this is happening to me on all 3.11 instances. (On the 4.1 it works after the first try)

//edit: The problem is gone after this commit https://github.com/integr8ly/application-monitoring-operator/commit/9aed4730e7c404c227eeba634be8016e32a0be52

mkralik3 commented 5 years ago

I also found a workaround for this issue. You have to delete finalizers statement those two remaining resources.

Go to the Resources -> Other Resources. Choose applicationmonitorings from the drop-down menu and edit example-applicationmonitoring resource. Delete followings rows:

  finalizers:
    - monitoring.cleanup

After save, the resource is gone. Do the same process with grafanadatasources/prometheus.

  finalizers:
    - grafana.cleanup
david-martin commented 5 years ago

@mkralik3 Can this be closed?

Has the change in 9aed473 resolved the terminating issue?

david-martin commented 5 years ago

@pb82 re: @alexkieling comment

Additionally, the first time the application-monitoring-operator is installed on Minishift, only the application-monitoring-operator and the prometheus-operator pods are created. I need to delete the project and install again to successfully install all components.

Is this still an issue? Is it something we should try fix for OpenShift 3?

mkralik3 commented 5 years ago

The change https://github.com/integr8ly/application-monitoring-operator/commit/9aed4730e7c404c227eeba634be8016e32a0be52 resolved the issue which @alexkieling mentioned, not the terminating issue. Terminating issue is still there.

marciopaiva commented 5 years ago

if u still stuck in terminating state (ocp 3.11), try this: oc patch applicationmonitoring.applicationmonitoring.integreatly.org/example-applicationmonitoring --type='json' -p='[{"op": "remove" , "path": "/metadata/finalizers" }]' -n application-monitoring

davidkirwan commented 4 years ago

We've recently removed the finalizers being added on the Grafana resources this should no longer be an issue. Be good to retest with latest codebase from master.

mkralik3 commented 4 years ago

@davidkirwan I have tested it today with OCP 3.11 (minishift) and OCP 4.3 and a resource applicationmonitorings/example-applicationmonitoring still blocks the deletion of the project.

The workaround, which @marciopaiva mentioned, works.

davidkirwan commented 4 years ago

@mkralik3 if you run oc delete project application-monitoring, it will delete the deployment for the Application Monitoring Operator, which will delete the operator. It will then be unable to clean up any resources it manages.

Can you try instead oc delete applicationmonitorings example-applicationmonitoring --namespace application-monitoring. Once the operator has cleaned up resources, the finalizers will be removed, and then the applicationmonitorings/example-applicationmonitoring will be deleted once complete, at this point you will be able to delete the namespace application-monitoring without trouble.

mkralik3 commented 4 years ago

Thanks, it works :) (only the operator left in the namespace but it is removed with the project after that). Due to that, I have closed this issue.