redhat-developer / gitops-operator

An operator that gets you an ArgoCD for cluster configuration out-of-the-box on OpenShift along with the UI for visualizing environments.
Apache License 2.0
151 stars 285 forks source link

Unable to load data: error getting cached app managed resources: NOAUTH Authentication required. #750

Open garyd2 opened 4 months ago

garyd2 commented 4 months ago

The GitOps operator in one env updated yesterday to 1.13.0 since then I cannot get to the resouces of any apps without hitting

Unable to load data: error getting cached app managed resources: NOAUTH Authentication required.

Restarted all pods in the Openshift-gitops namespace

Anyone seen this before? Is it something with Redis?

svghadi commented 4 months ago

from which version the upgrade happened? Are you not able to view the resources at all or you can but NOAUTH error message keeps popping up? Also, can you share application controller logs?

garyd2 commented 4 months ago

It was previously on 1.12.4 I believe. I am able to see the application tiles fine and it shows synced and healthy, but when I click into it and try and look at the pods it thows the error and I can't see any further pod details.

Logs of the application controller look like this (masked some details)

time="2024-07-22T07:49:00Z" level=info msg="Loading TLS configuration from secret xxxxx/argocd-server-tls"
time="2024-07-22T07:49:00Z" level=warning msg="Failed to save cluster info: NOAUTH Authentication required."
time="2024-07-22T07:49:04Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=xxx
time="2024-07-22T07:49:04Z" level=warning msg="Failed to get cached managed resources for tree reconciliation, fall back to full reconciliation" application=xxxxxxx dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" fields.level=0
time="2024-07-22T07:49:04Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: development)" application=xxxxxxx
time="2024-07-22T07:49:04Z" level=info msg="GetRepoObjs stats" application=xxxxxxx build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=54 unmarshal_ms=54 version_ms=0
time="2024-07-22T07:49:04Z" level=error msg="DiffFromCache error: error getting managed resources for app xxxx: NOAUTH Authentication required."
time="2024-07-22T07:49:04Z" level=error msg="Failed to cache app resources: error setting app resource tree: NOAUTH Authentication required." application=xxxxxxx dedup_ms=0 dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" diff_ms=13 fields.level=0 git_ms=54 health_ms=0 live_ms=0 settings_ms=0 sync_ms=0
time="2024-07-22T07:49:04Z" level=info msg="Skipping auto-sync: application status is Synced" application=xxxxxxx
time="2024-07-22T07:49:04Z" level=info msg="Update successful" application=xxxxxxx
time="2024-07-22T07:49:04Z" level=info msg="Reconciliation completed" application=xxxxxxx dedup_ms=0 dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" diff_ms=13 fields.level=0 git_ms=54 health_ms=0 live_ms=0 patch_ms=33 setop_ms=0 settings_ms=0 sync_ms=0 time_ms=132
time="2024-07-22T07:49:12Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=xxxxxxx
time="2024-07-22T07:49:12Z" level=warning msg="Failed to get cached managed resources for tree reconciliation, fall back to full reconciliation" application=xxxxxxx dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" fields.level=0
time="2024-07-22T07:49:12Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: development)" application=xxxxxxx
time="2024-07-22T07:49:12Z" level=info msg="GetRepoObjs stats" application=xxxxxxx build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=58 unmarshal_ms=58 version_ms=0
time="2024-07-22T07:49:12Z" level=error msg="DiffFromCache error: error getting managed resources for app xxx: NOAUTH Authentication required."
time="2024-07-22T07:49:12Z" level=error msg="Failed to cache app resources: error setting app resource tree: NOAUTH Authentication required." application=xxxxxxx dedup_ms=0 dest-name= dest-namespace=development dest-server="https://kubernetes.default.svc" diff_ms=15 fields.level=0 git_ms=59 health_ms=1 live_ms=0 settings_ms=0 sync_ms=0
time="2024-07-22T07:49:12Z" level=info msg="Skipping auto-sync: application status is Synced" application=xxxxxxx
time="2024-07-22T07:49:13Z" level=info msg="Update successful" application=xxxxxxx
svghadi commented 4 months ago

Strange. 1.12.4 has the redis authentication change so it shouldn't cause upgrade issues. Is it happening only on one cluster or seeing similar behavior on others aswell? Can you also check if redis-initial-password secret is present and not empty in the ArgoCD instance namespace and REDIS_PASSWORD env is referencing this secret in repo-server, application-controller and argocd-server deployments correctly.

garyd2 commented 4 months ago

I see an argocd-redis-initial-passord secret but the 2 data values are admin.password and immutable and both values are set, I don't have a REDIS_PASSWORD data value in it

svghadi commented 4 months ago

Oh sorry, I should have framed it better😅. Check the REDIS_PASSWORD env var in repo-server, application-controller and argocd-server deployments, not in the redis secret. The secret contains only 2 values.

garyd2 commented 4 months ago

Thanks

Would it be OK to just edit the statefuleset and deployment to add this in if they are missing?

svghadi commented 4 months ago

Yes, we can try that. But the operator should have handle this automatically. Could be bug...

garyd2 commented 4 months ago

Sorry for the delay in getting back, I updated the deployments and statefulset with the REDIS_PASSWORD environment variables and all is back working again, I can get the resources of apps again. Thanks a lot for your help

Feel free to close this, or if you want to keep it open to investigate a bug work away.

svghadi commented 4 months ago

Great. I will keep this issue open until the bug is triaged. Thanks for reporting it.

svghadi commented 4 months ago

@garyd2 - Just to rule out the possibility of broken operator reconciliation, can you confirm if there are any unusual error messages in the operator manager pod logs?

garyd2 commented 4 months ago

I have reviewed the openshift-gitops-operator-controller-manager logs and no errors are thown.