grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.51k stars 290 forks source link

Helm template - issues with how integration and oncall secrets are handled. #824

Open ksa-real opened 2 years ago

ksa-real commented 2 years ago

Helm chart: 1.0.10 App: 1.0.51

The ask is to provide a better way to supply Slack and Telegram-related env variables into Pod. This cause the following issues. At this moment FEATURE_SLACK_INTEGRATION_ENABLED is set regardless of oncall.slack.enabled value (i.e. either True or False, no option to skip that). Also, it is not possible to have a duplicate env variable name. Also, this variable cannot be set from Grafana UI. So if we want Slack integration in Grafana OnCall, we MUST set oncall.slack.enabled: true. Setting that also sets SLACK_SLASH_COMMAND_NAME, SLACK_CLIENT_OAUTH_ID, SLACK_CLIENT_OAUTH_SECRET, SLACK_SIGNING_SECRET, and SLACK_INSTALL_RETURN_REDIRECT_HOST. SLACK_CLIENT_OAUTH_SECRET and SLACK_SIGNING_SECRET are secrets and should not be openly present in custom values.yaml file. Also, it is not possible to override env values with .Values.env (to use valueFrom.secretKeyRef) as this causes env duplicate. This sets TELEGRAM_WEBHOOK_HOST to empty string and removes a reasonable default of {base_url}.

Possible options:

Another issue is that oncall secret with MIRAGE_CIPHER_IV, MIRAGE_SECRET_KEY, SECRET_KEY is re-created on every upgrade. This causes unnecessary rendered chart changes. Previously it invalidated grafana/oncall-engine connection and required new invite token. I guess, this was fixed (keeping fixed IV?). The solution is to add .Values.oncall.existingSecret and create oncall secret only if oncall.existingSecret is not set.

ksa-real commented 2 years ago

BTW for context. I was beating my head about why integration with slack stopped working after the upgrade. I dug into the oncall engine code and figured out there is a FEATURE_SLACK_INTEGRATION_ENABLED variable. Then from Helm chart I figured out it is false by default. Then I looked through the git history and checked it was added later and was backward incompatible change. Also, the integration setup docs say nothing about this variable, which added confusion. The change was introduced pretty long ago, so probably most people already figured this out, but still, there may be someone like me who won't dig into the code. It may make sense to document feature-enabling variables in the docs.

Matvey-Kuk commented 2 years ago

Thank you for this bug report! @alyssawada could you please collaborate with @vadimkerr about documenting ENV variables?

ksa-real commented 2 years ago

Can you please take a look at the PR?

atmaniak commented 1 year ago

I have the same problem here

alyssawada commented 1 year ago

@Matvey-Kuk - Catching up on Github notifications post-offsite. Prioritizing this and will sync with @vadimkerr!

ksa-real commented 1 year ago

@Matvey-Kuk @alyssawada The original PR has been closed in favor of https://github.com/grafana/oncall/pull/1016. The new PR contains some parts of the original one and more. It is open for almost a month, so I encourage you to review it. It takes some effort to keep it actual.

ksa-real commented 1 year ago

@Matvey-Kuk @alyssawada Ping again. Almost two months passed since the PR. Please, take a look. This is a useful change people may benefit from.

ksa-real commented 1 year ago

Also, label is incorrect, this is not just docs.

ksa-real commented 1 year ago

@Matvey-Kuk Can you please set the correct labels? It is helm-chart, not docs. Should I create another issue for https://github.com/grafana/oncall/pull/1016 and close the current one? I don't understand how to make code owners review the PR.