solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.07k stars 435 forks source link

Enterprise install fails on fresh local k3d cluster #6705

Open jameshbarton opened 2 years ago

jameshbarton commented 2 years ago

Gloo Edge Version

1.12.x (beta)

Kubernetes Version

1.21.x

Describe the bug

I'm trying to test the latest 1.12 enterprise beta (beta11) in a local k3d cluster. The cluster was just created on MacOS.

The glooe-grafana and glooe-prometheus-server pods fall into a CrashLoopBackOff state. Both are failing with file access errors.

glooe-prometheus-server logs:

│ glooe-prometheus-server-configmap-reload 2022/07/08 13:39:09 Watching directory: "/etc/config"                                        │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:364 msg="Starting Prometheus" version="(version=2.24.0, │
│  branch=HEAD, revision=02e92236a8bad3503ff5eec3e04ac205a3b8e4fe)"                                                                     │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:369 build_context="(go=go1.15.6, user=root@d9f90f0b1f76 │
│ , date=20210106-13:48:37)"                                                                                                            │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:370 host_details="(Linux 5.10.76-linuxkit #1 SMP Mon No │
│ v 8 10:21:19 UTC 2021 x86_64 glooe-prometheus-server-7d5b85764c-msf69 (none))"                                                        │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:371 fd_limits="(soft=1048576, hard=1048576)"            │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:372 vm_limits="(soft=unlimited, hard=unlimited)"        │
│ glooe-prometheus-server level=error ts=2022-07-08T14:41:53.762Z caller=query_logger.go:87 component=activeQueryTracker msg="Error ope │
│ ning query log file" file=/data/queries.active err="open /data/queries.active: permission denied"                                     │
│ glooe-prometheus-server panic: Unable to create mmap-ed active query log                                                              │
│ glooe-prometheus-server goroutine 1 [running]:                                                                                        │
│ glooe-prometheus-server github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7fff3f4cfafd, 0x5, 0x14, 0x31b88e0, 0xc00034e │
│ 900, 0x31b88e0)                                                                                                                       │
│ glooe-prometheus-server     /app/promql/query_logger.go:117 +0x4cf                                                                    │
│ glooe-prometheus-server main.main()                                                                                                   │
│ glooe-prometheus-server     /app/cmd/prometheus/main.go:400 +0x53ec                                                                   │
│ glooe-prometheus-server stream closed

glooe-grafana logs:

│ GF_PATHS_DATA='/var/lib/grafana/' is not writable.                                                                                    │
│ You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migrate-to-v51-or-later                                                                                                                                     │
│ mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied                                                           │
│ stream closed

Steps to reproduce the bug

  1. On MacOS, create k3d cluster:
    
    k3d cluster create --wait --config gloo.yaml

gloo.yaml file contents:

apiVersion: k3d.io/v1alpha4 kind: Simple metadata: name: gloo image: rancher/k3s:v1.21.3-k3s1 ports:

  1. Install Edge Enterprise

    glooctl install gateway enterprise \
    --license-key $GLOO_KEY \
    --version v1.12.0-beta11
  2. Observe CLBF condition in glooe-grafana and glooe-prometheus-server pods

Expected Behavior

Enterprise installation completes with no errors.

Additional Context

% glooctl version
Client: {"version":"1.12.0-beta11"}
Server: {"type":"Gateway","enterprise":true,"kubernetes":{"containers":[{"Tag":"1.12.0-beta22","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"6.2.4","Name":"redis","Registry":"docker.io"},{"Tag":"1.12.0-beta11","Name":"gloo-ee-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"observability-ee","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"discovery-ee","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"gloo-ee","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"rate-limit-ee","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"extauth-ee","Registry":"quay.io/solo-io"}],"namespace":"gloo-system"}}
jameshbarton commented 2 years ago

Workaround is to disable prom and grafana for my testing:

prometheus:
  server:
    persistentVolume:
      enabled: false

grafana:
  persistence:
    enabled: false
github-actions[bot] commented 3 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.