robusta-dev / robusta

Better Prometheus alerts for Kubernetes - smart grouping, AI enrichment, and automatic remediation
https://home.robusta.dev/
MIT License
2.61k stars 254 forks source link

Runner can't connect to external prometheus and alertmanager #1097

Open smoug25 opened 1 year ago

smoug25 commented 1 year ago

Describe the bug I have multicluster setup with separate monitoring cluster. For metrics querying i use Thanos Query and it works fine in-cluster robusta runner can connect through thanos query service and alertmanager throurn alertmanager service. I expose hosts for Thanos Query and Alertmanager with JWT auth through Ambassador Edge Stack. I able to request thanos and alert manager from my machine successfully but robusta runner return errors for thanos query 401 code for Alertmanager 400 code.

To Reproduce Steps to reproduce the behavior:

  1. Setup two clusters cluster A and cluster B
  2. Expose prometheus and alert manager on cluster A with JWT authorization
  3. Install robusta to clusters and add to robusta on cluster B url to prometheus and aletmanager in cluster A
  4. See error in rubusta on cluster B

Expected behavior No errors in robusta logs on external cluster and available app metrics in robusta UI

Robusta runner logs

2023-09-23 06:02:39.386 ERROR Failed to connect to prometheus. Couldn't connect to Prometheus found under https://thanos-query.areon.io Caused by HTTPError: 401 Client Error: Unauthorized for url: https://my-prometheus.url/api/v1/query?cluster=dev) Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/prometrix/connect/custom_connect.py", line 101, in check_prometheus_connection response.raise_for_status() File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://my-prometheus.url/api/v1/query?cluster=dev

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/app/src/robusta/core/sinks/robusta/prometheus_health_checker.py", line 61, in prometheus_connection_checks prometheus_connection.check_prometheus_connection(params={}) File "/usr/local/lib/python3.9/site-packages/prometrix/connect/custom_connect.py", line 103, in check_prometheus_connection raise PrometheusNotFound( prometrix.exceptions.PrometheusNotFound: Couldn't connect to Prometheus found under https://my-prometheus.url Caused by HTTPError: 401 Client Error: Unauthorized for url: https://my-prometheus.url/api/v1/query?cluster=dev)

Caused by HTTPError: 400 Client Error: Bad Request for url: https://my-alertmanager.url/api/v2/silences) Traceback (most recent call last): File "/app/src/robusta/utils/silence_utils.py", line 113, in get_alertmanager_silences_connection response.raise_for_status() File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://my-alertmanager.url/api/v2/silences

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/app/src/robusta/core/sinks/robusta/prometheus_health_checker.py", line 97, in alertmanager_connection_checks get_alertmanager_silences_connection(params=base_silence_params) File "/app/src/robusta/utils/silence_utils.py", line 116, in get_alertmanager_silences_connection raise AlertsManagerNotFound( robusta.core.exceptions.AlertsManagerNotFound: Could not connect to the alert manager [https://alertmanager.areon.io] Caused by HTTPError: 400 Client Error: Bad Request for url: https://my-alertmanager.url/api/v2/silences)

github-actions[bot] commented 1 year ago

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

Avi-Robusta commented 1 year ago

Hi @smoug25, I don't think we currently support JWT authorization in prometheus but we do support adding a custom prometheus authorization headers in robusta.

https://docs.robusta.dev/master/configuration/alertmanager-integration/outofcluster-prometheus.html#authentication-headers

Does something like this help?

smoug25 commented 1 year ago

Hi @Avi-Robusta Thanks for reply. No, it doesn't help unfortunately, I have use JWT as bearer token already. This is my robusta helm values file

robusta:
  clusterName: dev
  enablePrometheusStack: false
  disableCloudRouting: false
  globalConfig:
    alertmanager_url: "https://(my-alertmanager.url)"
    grafana_url: ""
    prometheus_url: "https://(my-prometheus.url)"
    chat_gpt_token: "{{ env.CHAT_GPT_TOKEN }}"

    prometheus_additional_labels:
      cluster: dev

    signing_key: "{{ env.ROBUSTA_GLOBAL_SIGNING_KEY }}"
    account_id: "{{ env.ROBUSTA_GLOBAL_ACCOUNT_ID }}"

    prometheus_auth: "Bearer {{ env.JWT_TOKEN }}"
    alertmanager_auth: "Bearer {{ env.JWT_TOKEN }}"
    prometheus_url_query_string: "cluster=dev"
  sinksConfig:
    - discord_sink:
        name: areon_discord_sink
        url: "{{ env.DISCORD_WEBHOOK }}"
    - robusta_sink:
        name: robusta_ui_sink
        token: "{{ env.ROBUSTA_TOKEN }}"
  enablePlatformPlaybooks: true
  runner:
    additional_env_vars:
    - name: GRAFANA_KEY
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: grafana_key
    - name: DISCORD_WEBHOOK
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: discord_webhook
    - name: ROBUSTA_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta_token
    - name: ROBUSTA_GLOBAL_SIGNING_KEY
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta_global_signing_key
    - name: ROBUSTA_GLOBAL_ACCOUNT_ID
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta_global_account_id
    - name: CHAT_GPT_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: chat_gpt_token
    - name: JWT_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: jwt_token
    - name: PROMETHEUS_SSL_ENABLED
      value: "true"                                                           
    sendAdditionalTelemetry: false
  rsa:
    private:  -- secret --
    public: -- secret --
  playbookRepos:
    chatgpt_robusta_actions:
      url: "https://github.com/robusta-dev/kubernetes-chatgpt-bot.git"

  customPlaybooks:
  - triggers:
    - on_prometheus_alert: {}
    actions:
    - chat_gpt_enricher: {}
Avi-Robusta commented 1 year ago

Hi @smoug25 Can you try running this with your url and token to see what thanos responds?

curl --location 'https://MY-PROMETHEUS.URL/api/v1/query?query=up' \
--header 'Authorization: Bearer JWT_TOKEN'

Some users have had issues with thanos because they needed to either specify port or make the url http instead of https, so if the curl doesn't work try either or both of those

smoug25 commented 1 year ago

@Avi-Robusta after sending your request to my thanos I got valid response with metrics. Let me narrow down one point. I use thanos without auth but it behind proxy with auth by JWT that expected in header 'Authorization: Bearer JWT_TOKEN'. And my alertmanager sites behind the same proxy but I get 400 code in response.

smoug25 commented 1 year ago

Hello, @Avi-Robusta. Do you have any updates with this issue?

Avi-Robusta commented 1 year ago

Hi @smoug25 , I wasnt able to replicate the issue, Would you like to jump on a call for me to debug this with you? You can pick a time from my Calendly.

smoug25 commented 1 year ago

Hi @Avi-Robusta, Do you have any ideas that we could do for better issue understanding?

Sheeproid commented 1 year ago

Hi @smoug25 . Avi is currently not available. It will be easier to discuss in the Slack community in the #support channel.

aantn commented 8 months ago

@smoug25 can you confirm if this is still happening or if it was fixed?

smoug25 commented 8 months ago

@aantn I'v updated to 0.10.29 and problem still relevant.

aantn commented 8 months ago

Weird. If you run the curl command from the robusta-runner pod, does it work? I am trying to figure out what is different about the way the runner connects.

smoug25 commented 8 months ago

If I make a Curl request from the robusta-runners Pod, it works fine and I receive a status code of 200 (OK).

smoug25 commented 1 month ago

I've found cause of problem. Something wrong with templating. I used, and this did not work:

prometheus_auth: "Bearer {{ env.JWT_TOKEN }}" alertmanager_auth: "Bearer {{ env.JWT_TOKEN }}"

After I added "Bearer " to kubernetes secret and live only env in template I got:

prometheus_auth: "{{ env.JWT_TOKEN }}" alertmanager_auth: "{{ env.JWT_TOKEN }}"

With this template all works like a charm.

aantn commented 1 month ago

Thanks, if I understand correctly we need to update the auth section on this page regarding Thanos. Is that correct?

pavangudiwada commented 3 weeks ago

@smoug25

globalConfig:
prometheus_auth: Bearer <YOUR TOKEN> # Replace <YOUR TOKEN> with your actual token or use any other auth header as needed
alertmanager_auth: Basic <USER:PASSWORD base64-encoded> # Replace <USER:PASSWORD base64-encoded> with your actual credentials, base64-encoded, or use any other auth header as needed

This :arrow_up: is the current config, and below is your suggestion

prometheus_auth: "{{ env.JWT_TOKEN }}" alertmanager_auth: "{{ env.JWT_TOKEN }}" instead of prometheus_auth: "Bearer {{ env.JWT_TOKEN }}" alertmanager_auth: "Bearer {{ env.JWT_TOKEN }}"

Should the alertmanager_auth secret contain " Basic "? Can you please clarify

smoug25 commented 3 weeks ago

@pavangudiwada Hi!

In case if user stores token in kubernetes secrets then secret must contain all auth header value " Basic ".

If use none secure way then this

globalConfig: prometheus_auth: Bearer <YOUR TOKEN> # Replace <YOUR TOKEN> with your actual token or use any other auth header as needed alertmanager_auth: Basic <USER:PASSWORD base64-encoded> # Replace <USER:PASSWORD base64-encoded> with your actual credentials, base64-encoded, or use any other auth header as needed

should works as expected.