Enterprise install fails on fresh local k3d cluster

jameshbarton commented 2 years ago

Gloo Edge Version

1.12.x (beta)

Kubernetes Version

1.21.x

Describe the bug

I'm trying to test the latest 1.12 enterprise beta (beta11) in a local k3d cluster. The cluster was just created on MacOS.

The glooe-grafana and glooe-prometheus-server pods fall into a CrashLoopBackOff state. Both are failing with file access errors.

glooe-prometheus-server logs:

│ glooe-prometheus-server-configmap-reload 2022/07/08 13:39:09 Watching directory: "/etc/config"                                        │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:364 msg="Starting Prometheus" version="(version=2.24.0, │
│  branch=HEAD, revision=02e92236a8bad3503ff5eec3e04ac205a3b8e4fe)"                                                                     │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:369 build_context="(go=go1.15.6, user=root@d9f90f0b1f76 │
│ , date=20210106-13:48:37)"                                                                                                            │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:370 host_details="(Linux 5.10.76-linuxkit #1 SMP Mon No │
│ v 8 10:21:19 UTC 2021 x86_64 glooe-prometheus-server-7d5b85764c-msf69 (none))"                                                        │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:371 fd_limits="(soft=1048576, hard=1048576)"            │
│ glooe-prometheus-server level=info ts=2022-07-08T14:41:53.762Z caller=main.go:372 vm_limits="(soft=unlimited, hard=unlimited)"        │
│ glooe-prometheus-server level=error ts=2022-07-08T14:41:53.762Z caller=query_logger.go:87 component=activeQueryTracker msg="Error ope │
│ ning query log file" file=/data/queries.active err="open /data/queries.active: permission denied"                                     │
│ glooe-prometheus-server panic: Unable to create mmap-ed active query log                                                              │
│ glooe-prometheus-server goroutine 1 [running]:                                                                                        │
│ glooe-prometheus-server github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7fff3f4cfafd, 0x5, 0x14, 0x31b88e0, 0xc00034e │
│ 900, 0x31b88e0)                                                                                                                       │
│ glooe-prometheus-server     /app/promql/query_logger.go:117 +0x4cf                                                                    │
│ glooe-prometheus-server main.main()                                                                                                   │
│ glooe-prometheus-server     /app/cmd/prometheus/main.go:400 +0x53ec                                                                   │
│ glooe-prometheus-server stream closed

glooe-grafana logs:

│ GF_PATHS_DATA='/var/lib/grafana/' is not writable.                                                                                    │
│ You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migrate-to-v51-or-later                                                                                                                                     │
│ mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied                                                           │
│ stream closed

Steps to reproduce the bug

On MacOS, create k3d cluster:


k3d cluster create --wait --config gloo.yaml

gloo.yaml file contents:

apiVersion: k3d.io/v1alpha4 kind: Simple metadata: name: gloo image: rancher/k3s:v1.21.3-k3s1 ports:

port: 8080:80 # same as --port '8080:80@loadbalancer' nodeFilters:
- loadbalancer
port: 8443:443 # same as --port '8443:443@loadbalancer' nodeFilters:
- loadbalancer options: k3d: # k3d runtime settings wait: true # wait for cluster to be usable before returining; same as --wait (default: true) timeout: "60s" # wait timeout before aborting; same as --timeout 60s disableLoadbalancer: false # same as --no-lb k3s: # options passed on to K3s itself extraArgs: # additional arguments passed to the k3s server command; same as --k3s-server-arg
- arg: --disable=traefik nodeFilters:
  - server:* nodeLabels:
- label: topology.kubernetes.io/region=us-east-1 # same as --k3s-node-label 'foo=bar@agent:1' -> this results in a Kubernetes node label nodeFilters:
  - agent:*
- label: topology.kubernetes.io/zone=us-east-1a # same as --k3s-node-label 'foo=bar@agent:1' -> this results in a Kubernetes node label nodeFilters:
  - agent:* kubeconfig: updateDefaultKubeconfig: true # add new cluster to your default Kubeconfig; same as --kubeconfig-update-default (default: true) switchCurrentContext: false # also set current-context to the new cluster's context; same as --kubeconfig-switch-context (default: true)

Install Edge Enterprise

glooctl install gateway enterprise \
--license-key $GLOO_KEY \
--version v1.12.0-beta11

Observe CLBF condition in glooe-grafana and glooe-prometheus-server pods

Expected Behavior

Enterprise installation completes with no errors.

Additional Context

% glooctl version
Client: {"version":"1.12.0-beta11"}
Server: {"type":"Gateway","enterprise":true,"kubernetes":{"containers":[{"Tag":"1.12.0-beta22","Name":"gateway","Registry":"quay.io/solo-io"},{"Tag":"6.2.4","Name":"redis","Registry":"docker.io"},{"Tag":"1.12.0-beta11","Name":"gloo-ee-envoy-wrapper","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"observability-ee","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"discovery-ee","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"gloo-ee","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"rate-limit-ee","Registry":"quay.io/solo-io"},{"Tag":"1.12.0-beta11","Name":"extauth-ee","Registry":"quay.io/solo-io"}],"namespace":"gloo-system"}}

jameshbarton commented 2 years ago

Workaround is to disable prom and grafana for my testing:

prometheus:
  server:
    persistentVolume:
      enabled: false

grafana:
  persistence:
    enabled: false

github-actions[bot] commented 3 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.

solo-io / gloo