grafana / helm-charts

Apache License 2.0
1.66k stars 2.27k forks source link

loki gateway in loki-simple-scalable not able to be added to grafana as loki datasource #921

Open craustin opened 2 years ago

craustin commented 2 years ago

I'm using loki-simple-scalable chart v0.1.0 to install loki on Rancher. When I create a Loki datasource in Grafana v7.5.11 pointing at the gateway's ingress hostname, Grafana reports:

Loki: Cannot connect to Loki. Unexpected error In the gateway's log (nginx log) it reports an HTTP 405 response to a request for /loki/api/v1/label?start=1640728988233000000

If I instead add a Loki datasource for the simple-scalable-read service's internal dns hostname instead (on port 3100), Grafana adds the datasource successfully.

I tried modifying the gateway's nginx conf to return a 200 for an OPTIONS request on this route, but that didn't seem to help - and confusingly, it seems the read service returns the same 405 for that OPTIONS request.

Is it expected that Grafana would fail to add a Loki datasource using the gateway hostname?

rufreakde commented 2 years ago

I would like to Bumb this issue I tried to use: http://loki-gateway.monitoring.svc.cluster.local:3100/ which is mentioned in the official guide: http://<helm-installation-name>-gateway.<namespace>.svc.cluster.local:3100/ https://grafana.com/docs/loki/latest/installation/simple-scalable-helm/

But it fails when I press save and test

Screenshot 2022-06-27 at 17 51 50 Screenshot 2022-06-27 at 18 04 38
rufreakde commented 2 years ago

changed to helm chart distributed that worked... http://monitoring-loki-loki-distributed-gateway.monitoring.svc.cluster.local/

trevorwhitney commented 2 years ago

I believe the <helm-installation-name> has been removed from service name. Does it work if you try loki-gateway.<namespace>.svc.cluster.local (or enterprise-logs-gateway.<namespace>.svc.cluster.local when enterprise.enabled: true)?

rufreakde commented 2 years ago

Thanks for your reply trevorwhitney. I will test it at some point (since it was urgent we had to use loki-stack chart for now)

But the point is that maybe the documentation needs an update to reflect the current state! https://grafana.com/docs/loki/latest/installation/simple-scalable-helm/

The section

To access the Grafana UI, run the following command:

kubectl port-forward --namespace <YOUR-NAMESPACE> service/loki-grafana 3000:80
Navigate to http://localhost:3000 and login with admin and the password output above. Then follow the [instructions for adding the Loki Data Source](https://grafana.com/docs/loki/latest/getting-started/grafana/), using the URL http://<helm-installation-name>-gateway.<namespace>.svc.cluster.local:3100/ for Loki, with <helm-installation-name> and <namespaced> replaced with the correct values for your deployment.

is totally outdated.

rpf3 commented 2 years ago

I think I was running into the same issue as everyone here. I am running the standalone Grafana and Loki "simple scalable" charts but could not successfully add Loki as a data source. What I ended up tracking down was there is a chart value loki.auth_enabled that defaults to true. This default value configures Loki in multi-tenant mode which requires all requests to be sent with an auth header. I came to this conclusion by tailing the logs of the "loki-gateway" service and saw a bunch of 401 errors. Once I deployed with loki.auth_enabled set to false everything worked as expected.

refs:

rufreakde commented 2 years ago

loki-gateway..svc.cluster.local

loki-gateway..svc.cluster.local this worked in the new chart as well.

@trevorwhitney can this be set as default value in the chart?

trevorwhitney commented 2 years ago

that seems reasonable, would you mind making a PR? would be better to discuss in context I think with the code to see how it affects different code paths.

rpf3 commented 1 year ago

@trevorwhitney do you mean a PR making the loki.auth_enabled default to false?

trevorwhitney commented 1 year ago

no, that was in response to the following (I think, as this was over a month ago)

loki-gateway..svc.cluster.local this worked in the new chart as well.

@trevorwhitney can this be set as default value in the chart?

rufreakde commented 1 year ago

@trevorwhitney sorry somehow I missed this thread. I will set myself a reminder to check that chart.

Modifying the values.yaml should not be that hard. I switched personally away from „simple-scalable“ helm chart and use just „loki“ now. With my own S3 db.

trevorwhitney commented 1 year ago

@rufreakde glad to hear that, we're hoping to get everyone on to the new loki chart in the grafana/loki repo so everyone's on the same chart and we can maximize benefit of community contributions (rather than having them split between multiple charts)

hpl002 commented 1 year ago

TLDR for the next fella;

  1. Disable auth, as discovered https://github.com/grafana/helm-charts/issues/921#issuecomment-1183597102
  2. Properly format URL ->http://loki-gateway.<NAMESPACE>.svc.cluster.local https://github.com/grafana/helm-charts/issues/921#issuecomment-1285575638
# values.yaml

  grafana:
    additionalDataSources:
     - name: Loki
       access: proxy
       editable: false
       orgId: 1
       type: loki
       url: http://loki-gateway.<NAMESPACE>.svc.cluster.local
       version: 1

spec -> https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.yaml#L517

dom93dd commented 1 year ago

@rufreakde

I am trying to use the "normal" loki chart and can't figure out why Grafana can't connect to this source.

I tried to add following url in Grafana: "http://loki-gateway.loki-dev.svc.cluster.local" also with auth_enabled false. Grafana has been deployed to namespace "grafana-dev".

Could you share your value files? Thanks!

rufreakde commented 1 year ago

@rufreakde

I am trying to use the "normal" loki chart and can't figure out why Grafana can't connect to this source.

I tried to add following url in Grafana: "http://loki-gateway.loki-dev.svc.cluster.local" also with auth_enabled false. Grafana has been deployed to namespace "grafana-dev".

Could you share your value files? Thanks!

Usually this would not be an issue but unfortunately I am on vacation now. Quickest would be some time around january.

dom93dd commented 1 year ago

@rufreakde I am trying to use the "normal" loki chart and can't figure out why Grafana can't connect to this source. I tried to add following url in Grafana: "http://loki-gateway.loki-dev.svc.cluster.local" also with auth_enabled false. Grafana has been deployed to namespace "grafana-dev". Could you share your value files? Thanks!

Usually this would not be an issue but unfortunately I am on vacation now. Quickest would be some time around january.

No worries! Maybe I have found a solution until then. What really grinds my gears is that the gateway default deployment is set to port 80. Reader, Writer do have port 3100. Shouldn't the gateway have port 3100?

dom93dd commented 1 year ago

So I got it working. Turns out that only decreasing read and write replicas isn't enough. For my dev instance I only needed one replica of read and write and not this high scalable option with 3 replicas, each on a different node. I had to add "replication_factor" and set it to "1" too.

I am adding my value files, if others experiencing the same issue. Grafana, Loki and Promtail are deployed into the same namespace "monitoring-dev". This is all running on an azure kubernetes cluster with Kong Gateway deployed. Loki uses azure blob storage.

Hope this helps somebody.

# loki values-dev.yaml

write:
  replicas: 1
read:
  replicas: 1

loki:
  auth_enabled: false
  server:
    http_listen_port: 3100
  commonConfig:
    replication_factor: 1

  schemaConfig:
    configs:
    - from: "2022-12-01"
      index:
        period: 24h
        prefix: loki_index_
      schema: v11
      store: boltdb-shipper
      object_store: azure

  storage_config:
    boltdb_shipper:
      active_index_directory: /var/loki/index
      cache_location: /var/loki/cache
      cache_ttl: 24h
      shared_store: azure
    azure:
      account_name: loggingstorage
      account_key: ***
      container_name: loki-dev
      request_timeout: 0

  compactor:
    working_directory: /var/loki/boltdb-shipper-compactor
    shared_store: azure
# Grafana values-dev.yaml

service:
  enabled: true
  type: ClusterIP

ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: kong
  path: /monitoring/grafana
  hosts:
    - <hosturl>
  tls:
    - secretName: acme-http01-cert-issued-default
    - hosts:
        - <hosturl>

grafana.ini:
  server:
    root_url: <url>
    serve_from_sub_path: true
  auth.azuread:
    name: Azure AD
    enabled: true
    allow_sign_up: true
    client_id: <clientid>
    client_secret: <clientsecret>
    scopes: openid email profile offline_access
    auth_url: https://login.microsoftonline.com/<tenantid>/oauth2/v2.0/authorize
    token_url: https://login.microsoftonline.com/<tenantid>/oauth2/v2.0/token
    allowed_groups: <groupid>
    role_attribute_strict: true
    allow_assign_grafana_admin: true

datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Loki
      type: loki
      url: http://loki-gateway.monitoring-dev.svc.cluster.local
      access: proxy
      isDefault: true
# Promtail values-dev.yaml

config:
  clients:
    - url: http://loki-gateway/loki/api/v1/push
romosa commented 7 months ago

I am still not able to add loki as datasource.

I already tried suggested configurations here.

Loki url: http://loki-backend.monitoring-system.svc.cluster.local:3100/ Here is values.yaml

write:
  replicas: 1
read:
  replicas: 1

loki:
  auth_enabled: false
  server:
    http_listen_port: 3100
  commonConfig:
    replication_factor: 1
  limits_config:
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    max_cache_freshness_per_query: 30m
    split_queries_by_interval: 0

I see this logs on gateway.

level=error ts=2024-03-26T03:47:51.957257389Z caller=cached_client.go:189 msg="failed to build table names cache" err="RequestCanceled: request context canceled\ncaused by: context canceled"
level=error ts=2024-03-26T03:47:51.95729556Z caller=compactor.go:523 msg="failed to run compaction" err="failed to list tables: RequestCanceled: request context canceled\ncaused by: context canceled"
2024/03/26 03:48:13 WARN: failed to get session token, falling back to IMDSv1: 405 Method Not Allowed: Method Not Allowed
        status code: 405, request id:
caused by: EC2MetadataError: failed to make EC2Metadata request
<html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
</body>
</html>

        status code: 405, request id:
level=error ts=2024-03-26T03:48:13.672687062Z caller=reporter.go:205 msg="failed to delete corrupted cluster seed file, deleting it" err="NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors"
level=error ts=2024-03-26T03:48:42.012564069Z caller=ruler.go:571 msg="unable to list rules" err="NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors"