[BUG] New pods shortly deleted, and the old pods remains. ArgoCD

qdrddr commented 8 months ago

Describe the bug Tested with strategy: default or env-vars. Using ArgoCD. ArgoCD app ory\oathkeeper that includes the ConfigMap accessrules to monitor. The deployment has annotation: configmap.reloader.stakater.com/reload: accessrules

When I modify the ConfigMap in git, the reloader notices the change, creates new pods, but they are deleted shortly after creation and the old version remains intact.

To Reproduce Steps to reproduce the behavior

Expected behavior The old pod should be deleted and the new pod remains.

Logs from the Reloader

time="2024-03-12T20:49:44Z" level=error msg="Update for 'oathkeeper' of type 'Deployment' in namespace 'ory' failed with error Operation cannot be fulfilled on deployments.apps \"oathkeeper\": the object has been modified; please apply your changes to the latest version and try again"
time="2024-03-12T20:49:44Z" level=error msg="Rolling upgrade for 'accessrules' failed with error = Operation cannot be fulfilled on deployments.apps \"oathkeeper\": the object has been modified; please apply your changes to the latest version and try again"
time="2024-03-12T20:49:44Z" level=error msg="Error syncing events: Operation cannot be fulfilled on deployments.apps \"oathkeeper\": the object has been modified; please apply your changes to the latest version and try again"
time="2024-03-12T20:49:51Z" level=info msg="Changes detected in 'accessrules' of type 'CONFIGMAP' in namespace 'ory', Updated 'oathkeeper' of type 'Deployment' in namespace 'ory'"

Environment

Operator Version:
Kubernetes/OpenShift Version: RKE2 k8s v1.27.10+rke2r1
ArgoCD Version: v2.10.2
Reloader: Deployed via ArgoCD using Helmchart 1.0.69

Additional context the helm values file:

reloader:
  enableHA: true

deployment:
  # If you wish to run multiple replicas set reloader.enableHA = true
  replicas: 2
  # Set to default, env-vars or annotations
  reloadStrategy: env-vars

MuneebAijaz commented 8 months ago

@qdrddr the way reloader works is by updating env var, which triggers a deployment change and update it sent to the pods. Could there be any case that argocd is actively reverting the changes to the deployment done by reloader, in that case new pods will be deleted and old state will be persisted?

qdrddr commented 8 months ago

@MuneebAijaz would you recommend how to workaround this and continue using ArgoCD?

MuneebAijaz commented 8 months ago

@qdrddr i think this would need more investigation on if the reason really is ArgoCD. and if it is, should it watch the Env field under deployments. if not, ArgoCD provides ways to ignore specific fields in specific resources

qdrddr commented 7 months ago

Could you point to the documents for this?

Also, I have doubts about the proposed workaround to set ArgoCD to skip checking parts of a resource, as even if ArgoCD ignores Envs, it still will see and kill extra containers with auto-prune settings regardless of changes in Env.

So basically, I cannot use Reloader with ArgoCD with enabled auto pruning. @MuneebAijaz

qdrddr commented 7 months ago

The problem is that Reloader creates additional containers before killing outdated containers to reduce impact, which is an excellent strategy. But ArgoCD, with enabled Pruning, notices an extra container and kills it before Reloader gets a chance to kill the outdated container. Resulting in the outdated containers remain unchanged.

Do you know if ArgoCD integration is needed here?

Ideas on how this can be fixed:

So ideally, in this scenario of Reloader creating a new container and then killing the old one and so on, instead, Reloader should tell ArgoCD to increase replicas by one, and then Reloader can kill the old containers one by one. So ArgoCD with enabled Pruning would create a new extra container and re-create those deleted by Reloader.
Alternatively, you can temporarily turn off auto pruning in ArgoCD for a given app and reload it once the Reloader is complete.
A not-ideal scenario that might also work is to check if there is more than one container in the replica set without creating an extra container, kill one by one, and wait till they are re-created by ArgoCD & k8s.

MuneebAijaz commented 6 months ago

I have doubts about the proposed workaround to set ArgoCD

yes, there are implications to that approach. but not the ones stated above.

The problem is that Reloader creates additional containers before killing outdated containers

Reloader doesnt do that, Reloader only updates ENV field in the parent resource (Deployment, Statefulset, Daemonset), and when an ENV is updated, Deployment etc are bound to propagate that change to the pods, so the parent resource spins up another ReplicaSet with new ENV, and Replicaset creates new pod with updated ENV. that is how update is done by Reloader.

Reloader itself doesnt manage pod/container lifecycle, to not have any effect on user's application, it relies on already set deployment strategy in Deployments to propagate that change.

MuneebAijaz commented 6 months ago

I will try to replicate the issue on my end, and get back to you.

0123hoang commented 6 months ago

@qdrddr Did you follow #333 ? I changed from reloadStrategy: annotations to env-vars and problem gone.

qdrddr commented 6 months ago

@0123hoang Nope, the problem persists with reloadStrategy: env-vars

CleanShot 2024-05-21 at 20 38 15

shameemshah commented 6 months ago

we are also facing the same issue

Gatschknoedel commented 5 months ago

I would debug this by disabling self-heal for the responsible argo app, let reloader do its thing and afterwards check the argo app. My guess is that the application is out of sync and argo is immediately reverting because of that.

BlackRoach commented 3 months ago

After the reloader propagated changes and while the new pods were reloading, I clicked the ArgoCD Sync button. As a result, the new pods were immediately deleted and replaced with the old pods.

I think ArgoCD auto sync revert changes from reloader.

MuneebAijaz commented 2 months ago

Have you tried setting reload strategy as annotations? Related issue: https://github.com/stakater/Reloader/issues/701

stakater / Reloader

[BUG] New pods shortly deleted, and the old pods remains. ArgoCD #627