grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.44k stars 276 forks source link

Slack integration issues #668

Closed davidspek closed 1 year ago

davidspek commented 1 year ago

I'm facing the same issue as described in https://github.com/grafana/oncall/issues/90, where the client_id parameter is set to None when trying to integrate Slack. Domain for OnCall is https://oncall.example.com and for Grafana https://grafana.example.com. If I then try to manually add the client_id the oauth flow continues but fails. There are no errors I can find in the engine logs.

davidspek commented 1 year ago

@joeyorlando sorry to bother you, but since you created the last release I thought I'd mention I'm still running into this issue with v1.0.45.

davidspek commented 1 year ago

@iskhakov Maybe you could share your thoughts on this.

davidspek commented 1 year ago

@vadimkerr I'm still having this issue with the latest release v1.0.49.

vadimkerr commented 1 year ago

Hi @DavidSpek 👋 Could you please confirm that you have set SLACK_CLIENT_OAUTH_ID, SLACK_CLIENT_OAUTH_SECRET and SLACK_INSTALL_RETURN_REDIRECT_HOST env variables?

davidspek commented 1 year ago

@vadimkerr SLACK_CLIENT_OAUTH_ID, SLACK_CLIENT_OAUTH_SECRET, SLACK_INSTALL_RETURN_REDIRECT_HOST, SLACK_SIGNING_SECRET and SLACK_SLASH_COMMAND_NAME are all set appropriately. After manually adding the ClientID to the URL and I allow the permissions for Slack I get redirected back to https://<grafana-domain>/a/grafana-oncall-app/?page=chat-ops&slack_error=auth_failed. Again, nothing useful in the logs to go off of.

OnCall and Grafana are both installed under the same TLD with their own subdomains and publicly exposed, which I think would be a normal setup and should work.

davidspek commented 1 year ago

I've found some more clues as to what might be going on. On a fresh installation on a different Kubernetes cluster with a different Grafana installation, the clientID was being set when redirecting to Slack. Since that was a client ID for a different OnCall deployment I created a new app on Slack and then changed the Slack client ID in the OnCall deployment.

However, after changing the Slack client ID in the OnCall deployment, it would still pass the old client ID to the Slack login URL. After that I thought the old client ID might be saved in the database somewhere so I destroyed the OnCall database and redeployed OnCall with the new client ID. However, even with the fresh OnCall deployment and a clean DB the old client ID is still being used when trying to login to Slack. The same happens when trying to do it in incognito mode so I don't believe the client ID is somehow being cached in the browser. I also deleted the Grafana pod and tried again in a regular browser and in incognito mode and the old client ID is still being used.

I think 2 things can be concluded from this:

  1. The Slack client ID (and likely other settings) are being persisted somewhere they shouldn't be
  2. This is likely being persisted somewhere in Grafana and not in OnCall itself
davidspek commented 1 year ago

So now I've tried moving the Grafana installation from SQLite to PostgreSQL thinking the Slack client ID might be persisted somewhere in the Grafana database, but it still doesn't pickup the new Client ID. After destroying the PVCs backing Grafana, the Grafana PosgreSQL database and the OnCall PostgreSQL database and redeploying all the apps I'm still running into the issue where the old client ID is being used. What is strange is that somehow after deleting literally everything when Grafana was up again the OnCall plugin was somehow connected. This was when opening the fresh Grafana in a different browser in incognito mode to try and ensure it isn't some strange browser caching behavior.

So I'm not sure what and where is persisting this data, but something strange is going on for sure.

helmecke commented 1 year ago

I have a similar Problem. I deployed Grafana OnCall via Helm without Slack enabled. Now I enabled Slack with all Slack ENV vars set via Helm and all showing correctly in Env Variables tab in Grafana OnCall.

On the ChatOps Slack tab I only see Setup ENV Variables: image

If I disable caching via developer console in the browser, the Open Slack connection link button appears:

image

image

The link however does not contain the configured settings. Instead client_id=None and redirect_url is not set to the value of SLACK_INSTALL_RETURN_REDIRECT_HOST.

davidspek commented 1 year ago

@vadimkerr Maybe you can point too who can be approached about this issue.

vadimkerr commented 1 year ago

Hey @DavidSpek 👋 Seems like the issue is that values on Env Variables page have higher priority than "regular" env variables passed via Helm, right?

As a quick solution here, I would propose to set the FEATURE_LIVE_SETTINGS_ENABLED env variable to False so the live settings feature is disabled and settings are taken from the "regular" env variables only. This would be a viable solution if you only want to manage env variables through helm, and not the web UI.

But as a proper solution, it would be great to change the live settings feature so it refreshes the "cache" in some cases like this.

helmecke commented 1 year ago

This workaround works for me. Thank you.

env:
  - name: FEATURE_LIVE_SETTINGS_ENABLED
    value: "False"