grafana / helm-charts

Apache License 2.0
1.6k stars 2.24k forks source link

[loki-distributed] Can't specify annotations on all services, needed by GKE #1161

Open cydergoth opened 2 years ago

cydergoth commented 2 years ago

[loki-distributed] Can't specify annotations on all services, needed by GKE

GKE uses annotations like :

cloud.google.com/backend-config: '{"ports":{"80": "loki-longer-timeout"}}'

Loki Distributed helm chart doesn't parameterize annotations on most services

vagrant: (ops-sre-admin:monitoring):~/.../iac/sre_admin_cluster/loki $ kubectl get backendconfig -o yaml >loki-backend-config.yaml
vagrant: (ops-sre-admin:monitoring):~/.../iac/sre_admin_cluster/loki $ cat loki-backend-config.yaml
apiVersion: v1
items:
- apiVersion: cloud.google.com/v1
  kind: BackendConfig
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"cloud.google.com/v1","kind":"BackendConfig","metadata":{"annotations":{},"name":"loki-longer-timeout","namespace":"monitoring"},"spec":{"timeoutSec":600}}
    creationTimestamp: "2022-01-07T02:08:43Z"
    generation: 1
    name: loki-longer-timeout
    namespace: monitoring
    resourceVersion: "69137094"
    uid: 506b375b-438b-4e97-9339-924f0a50d4bf
  spec:
    timeoutSec: 600
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
cydergoth commented 2 years ago

It works for

# Source: sre-loki/charts/loki-distributed/templates/gateway/service-gateway.yaml
apiVersion: v1
kind: Service
metadata:
  name: sre-loki-distributed-gateway
  labels:
    helm.sh/chart: loki-distributed-0.47.0
    app.kubernetes.io/name: loki-distributed
    app.kubernetes.io/instance: sre
    app.kubernetes.io/version: "2.4.2"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: gateway
  annotations:
    cloud.google.com/backend-config: '{"ports":{"80": "loki-longer-timeout"}}'
    cloud.google.com/load-balancer-type: Internal
    networking.gke.io/internal-load-balancer-allow-global-access: "true"
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      targetPort: http
      protocol: TCP
  selector:
    app.kubernetes.io/name: loki-distributed
    app.kubernetes.io/instance: sre
    app.kubernetes.io/component: gateway

But not for the other services

cydergoth commented 2 years ago

Reference: https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#create_backendconfig

trevorwhitney commented 2 years ago

I think the intention here was that only the gateway should need to be exposed outside the VPC since it serves as a reverse-proxy to all the other services. What's your use case for needed other services exposed as well?

cydergoth commented 2 years ago

We're getting timeouts on the frontend due to GKE 30s service default timeout. This seems to help, although it isn't the only issue we're having with large queries. GCS chunk retrieval also seems to be a concern (I think we may have too many small files) - any queries over a time window of more rhan a couple of hours, even filtered by a couple of labels seem to timeout.

Hope this helps

On Fri, Apr 1, 2022, 4:37 PM Trevor Whitney @.***> wrote:

I think the intention here was that only the gateway should need to be exposed outside the VPC since it serves as a reverse-proxy to all the other services. What's your use case for needed other services exposed as well?

— Reply to this email directly, view it on GitHub https://github.com/grafana/helm-charts/issues/1161#issuecomment-1086348295, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPWAWWJXX3NCPXJ4UE6HITVC5UCDANCNFSM5SG223UQ . You are receiving this because you authored the thread.Message ID: @.***>