grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.44k stars 276 forks source link

Bug: OnCall periodically gets stuck synchronizing #1525

Open BojanOro opened 1 year ago

BojanOro commented 1 year ago

Hi folks,

We're having an issue where OnCall periodically refuses to connect in the UI. When clicking the OnCall plugin, we're prompted with the "initializing plugin" line while a dozen sync requests are made in the background. Each request results in a 200 with the following body:

{"token_ok":false,"license":"OpenSource","version":"v1.1.32","recaptcha_site_key":"xxxxx"}

Clicking the "Configure" button on this error page tries to re-sync the plugin, and eventually results in the error message

There was an issue while synchronizing data required for the plugin.
Verify your OnCall backend setup (ie. that Celery workers are launched and properly configured)

On the backend, in both Grafana and OnCall Engine + Celery I don't see any error messages. The issue is resolved by restarting the OnCall Engine service. When I do this, the sync request returns with token_ok: true.

Any details on where to look further for information as to why this happens would be appreciated!

Details:

License: Open Source Plugin version: v1.1.32 Grafana Version v9.4.3 Hosted on Kubernetes (AKS), managed with ArgoCD

XLordalX commented 1 year ago

Having the same issue. After around 24 hours, it gets stuck. Restarting oncall engine resolves the issue.

via-justa commented 9 months ago

On 1.3.77, it's still happening. Any direction to solving it?

mderynck commented 1 week ago

Recently we made some changes to the way Grafana OnCall is initialized. Use 1.9.22, there were quite a few changes along the way from 1.9.0-1.9.22 to get things working.