grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.47k stars 285 forks source link

Grafana API Token is recreated each time OnCall page is being opened. #508

Closed lstama closed 3 weeks ago

lstama commented 2 years ago

Hello, I have a problem which make user without admin privilege can't access Grafana OnCall.

When I (as an organization admin) open OnCall page for the first time (or when doing a reload), I always greeted by this error page. retry

And when I click the retry button, this happened. as-admin

Then I'm redirected to the normal OnCall page success

For now it looks fine, I still can access OnCall in the end, and can create alert and integration. Then my friend who isn't an admin want to view the page as an editor. I instruct him to do the same (click retry if the error page shows up). He did what I said, but instead of seeing the same view as picture two, he got this: as-non-admin

Turns out the Grafana API Token is always being recreated each time someone reload the page (already checked the DB value), Plugin page also has this error: plugin-error

What I already did:

  1. Restart Grafana
  2. Recreate Grafana OnCall one time invite token
  3. Using both Server admin and Organization admin to setup the plugin

Also, these are some relevant logs from Grafana:

2022-09-07T11:03:57.602351353Z stdout F logger=context traceID=00000000000000000000000000000000 t=2022-09-07T11:03:57.602136283Z level=error msg="invalid API key" error="invalid API key" traceID=00000000000000000000000000000000

Oncall Engine:

        2022-09-07 18:03:57 
2022-09-07T11:03:57.56885098Z stdout F 2022-09-07 11:03:57 source=engine:app google_trace_id=none logger=root inbound latency=0.012544 status=202 method=POST path=/api/internal/v1/plugin/sync content-length=0 slow=0 integration_type=N/A integration_token=N/A
2022-09-07 18:03:57 
2022-09-07T11:03:57.569311472Z stdout F 2022-09-07 11:03:57 source=engine:uwsgi status=202 method=POST path=/api/internal/v1/plugin/sync latency=0.013501 google_trace_id=- protocol=HTTP/1.1 resp_size=278 req_body_size=0
2022-09-07 18:03:58 
2022-09-07T11:03:58.267741236Z stdout F 2022-09-07 11:03:58 source=engine:app google_trace_id=none logger=root outbound latency=0.09500776696950197 status=200 method=GET url=https://grafana.my.org/api/org/users slow=0 
2022-09-07 18:03:58 
2022-09-07T11:03:58.450652135Z stdout F 2022-09-07 11:03:58 source=engine:app google_trace_id=none logger=root outbound latency=0.16960133600514382 status=200 method=GET url=https://grafana.my.org/api/teams/search?perpage=1000000 slow=0 
2022-09-07 18:03:58 
2022-09-07T11:03:58.458742653Z stdout F 2022-09-07 11:03:58 source=engine:app google_trace_id=none logger=root inbound latency=0.29701 status=204 method=POST path=/api/internal/v1/plugin/install content-length=0 slow=0 integration_type=N/A integration_token=N/A
2022-09-07 18:03:58 
2022-09-07T11:03:58.459229884Z stdout F 2022-09-07 11:03:58 source=engine:uwsgi status=204 method=POST path=/api/internal/v1/plugin/install latency=0.298131 google_trace_id=- protocol=HTTP/1.1 resp_size=168 req_body_size=0
2022-09-07 18:04:00 
2022-09-07T11:04:00.561275718Z stdout F 2022-09-07 11:04:00 source=engine:app google_trace_id=none logger=root inbound latency=0.007313 status=200 method=GET path=/api/internal/v1/plugin/sync content-length=0 slow=0 integration_type=N/A integration_token=N/A
2022-09-07 18:04:00 
2022-09-07T11:04:00.561751676Z stdout F 2022-09-07 11:04:00 source=engine:uwsgi status=200 method=GET path=/api/internal/v1/plugin/sync latency=0.008264 google_trace_id=- protocol=HTTP/1.1 resp_size=264 req_body_size=0

Celery

2022-09-07 18:03:21 
2022-09-07T11:03:21.330679494Z stderr F 2022-09-07 11:03:21,330 source=engine:celery task_id=aa674da7-355e-4763-a159-f63922251ada task_name=apps.slack.representatives.alert_group_representative.on_alert_group_update_log_report_async name=celery.app.trace level=INFO Task apps.slack.representatives.alert_group_representative.on_alert_group_update_log_report_async[aa674da7-355e-4763-a159-f63922251ada] succeeded in 0.01293463003821671s: None
2022-09-07 18:03:57 
2022-09-07T11:03:57.569954095Z stderr F 2022-09-07 11:03:57,569 source=engine:celery task_id=??? task_name=??? name=celery.worker.strategy level=INFO Task apps.grafana_plugin.tasks.sync.plugin_sync_organization_async[a29cac90-8e2b-4ef4-9443-6f22d2046646] received
2022-09-07 18:03:57 
2022-09-07T11:03:57.571366083Z stderr F 2022-09-07 11:03:57,571 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.tasks.sync level=INFO Start sync Organization 1
2022-09-07 18:03:57 
2022-09-07T11:03:57.604990332Z stderr F 2022-09-07 11:03:57,604 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.helpers.client level=WARNING Error connecting to api instance 401 Client Error: Unauthorized for url: https://grafana.my.org/api/org/users
2022-09-07 18:03:57 
2022-09-07T11:03:57.605336694Z stderr F 2022-09-07 11:03:57,604 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=root level=INFO outbound latency=0.02740099304355681 status=401 method=GET url=https://grafana.my.org/api/org/users slow=0 
2022-09-07 18:03:57 
2022-09-07T11:03:57.607496313Z stderr F 2022-09-07 11:03:57,607 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.tasks.sync level=INFO Finish sync Organization 1
2022-09-07 18:03:57 
2022-09-07T11:03:57.607639099Z stderr F 2022-09-07 11:03:57,607 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=celery.app.trace level=INFO Task apps.grafana_plugin.tasks.sync.plugin_sync_organization_async[a29cac90-8e2b-4ef4-9443-6f22d2046646] succeeded in 0.03653913899324834s: None

We're using existing Grafana as OnCall frontend and deploy Grafana OnCall using helm in this repository. The alerting and OnCall system itself work normally.

th30nlyw4y commented 2 years ago

I've faced the same problem. Looks just like https://github.com/grafana/oncall/issues/316. I've reinstalled oncall deployment and now it's performing fine

lstama commented 2 years ago

I've faced the same problem. Looks just like #316. I've reinstalled oncall deployment and now it's performing fine

What do you mean by reinstalling oncall deployment? Is it the Engine and Celery part, or everything including MariaDB, Redis, and RabbitMQ (using a newly fresh DB)?

th30nlyw4y commented 2 years ago

I meant re-deploying oncall helm chart (i have redis, mariadb, rabbitmq, celery and engine enabled for deployment). Also i think that it's better to delete PVC's (you should do it manually, as it's stated in docs), because sometimes plugin init fails

lstama commented 2 years ago

I meant re-deploying oncall helm chart (i have redis, mariadb, rabbitmq, celery and engine enabled for deployment). Also i think that it's better to delete PVC's (you should do it manually, as it's stated in docs), because sometimes plugin init fails

Thanks, reinstalling works.

But now all my integrations and settings are wiped out. As I don't know which DB table is safe to backup and restore.

th30nlyw4y commented 2 years ago

Thanks, reinstalling works.

But now all my integrations and settings are wiped out. As I don't know which DB table is safe to backup and restore.

Yep, that's quite inconvenient. Hope this behavior would be fixed soon

juris commented 1 year ago

Got the same issue with Grafana 9.2.6, Oncall 1.1.5 and Helm 1.0.12. After couple of days Oncall setup becomes useless. Removing OnCall API key from https://grafana/org/apikeys helps till the next page reload.

duclm2609 commented 1 year ago

I have the same issue as @juris

PCbIX commented 1 year ago

I have the same issue. Helm deployment, only postgresql is external. Grafana 9.3.2 ; OnCall 1.1.14 ; Ingress disabled (HAProxy) After some time plugin just lost their API key: In grafana logs I've found: logger=context t=2023-01-12T13:21:53.551966373Z level=error msg="invalid API key" error="invalid API key" traceID= logger=data-proxy-log userId=2 orgId=1 uname=user path=/api/plugin-proxy/grafana-oncall-app/api/internal/v1/alertgroups/stats/ remote_addr=ip referer="https://fqdn/a/grafana-oncall-app/?page=incidents&status=0&status=1" t=2023-01-12T11:47:13.60359748Z level=error msg="Proxy request failed" err="dial tcp ip:8080: connect: connection refused" Plugin configuration page say cannot communicate with oncall-engine but don't provide button to reset configuration. Sometimes just opening general page of on call starts api key exchange as people wrote above, sometimes only redeploy helps to me.

ifeneg commented 1 year ago

I have the same issue as @juris

Milamary commented 1 year ago

Have the same issue as @PCbIX:

Each time I leave Grafana Oncall page the Grafana API Token is supposed to be recreated, but it's not recreating and I'm loosing the connection to Grafana Oncall plugin with a message: 'There was an issue while synchronizing data required for the plugin. Verify your OnCall backend setup (ie. that Celery workers are launched and properly configured)'

First time the workaround to reopen a general page of Grafana Oncall to start api key exchange went through: the notification of the API token creation popped up and I could access Grafana Oncall. The second time it didn't work and I'm stuck on 'Initializing plugin' step with a message: 'There was an issue while synchronizing data required for the plugin. Verify your OnCall backend setup (ie. that Celery workers are launched and properly configured)'.

mderynck commented 1 month ago

Recently we made some changes to the way Grafana OnCall is initialized. Use 1.9.22, there were quite a few changes along the way from 1.9.0-1.9.22 to get things working.