jaegertracing / helm-charts

Helm Charts for Jaeger backend
Apache License 2.0
269 stars 348 forks source link

Basic helm chart fails to install #319

Open lkoniecz opened 3 years ago

lkoniecz commented 3 years ago

Describe the bug

NAME                                READY   STATUS             RESTARTS   AGE
jaeger-agent-6tfd2                  1/1     Running            0          50s
jaeger-agent-fw2k4                  1/1     Running            0          50s
jaeger-agent-q87n6                  1/1     Running            0          50s
jaeger-cassandra-schema-7b29c       1/1     Running            0          50s
jaeger-collector-79b7c4bdc6-qxx56   0/1     CrashLoopBackOff   2          50s
jaeger-query-556c758fdf-5s67x       1/2     Error              3          50s
2021/12/02 06:26:58 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1638426418.2399487,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1638426418.2399886,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1638426418.2402306,"caller":"flags/admin.go:104","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1638426418.2402766,"caller":"flags/admin.go:115","msg":"Starting admin HTTP server","http-addr":":16687"}
{"level":"info","ts":1638426418.240292,"caller":"flags/admin.go:96","msg":"Admin server started","http.host-port":"[::]:16687","health-status":"unavailable"}
2021/12/02 06:26:58 gocql: dns error: lookup cassandra on 172.20.0.10:53: no such host
{"level":"fatal","ts":1638426418.257373,"caller":"./main.go:101","msg":"Failed to init storage factory","error":"gocql: unable to create session: failed to resolve any of the provided hostnames","stacktrace":"main.main.func1\n\t./main.go:101\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.2.1/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.2.1/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.2.1/command.go:902\nmain.main\n\t./main.go:161\nruntime.main\n\truntime/proc.go:255"}

To Reproduce Steps to reproduce the behavior:

  1. follow the installation steps: https://artifacthub.io/packages/helm/jaegertracing/jaeger
  2. helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
  3. helm install jaeger jaegertracing/jaeger

Expected behavior it works

Screenshots If applicable, add screenshots to help explain your problem.

Version (please complete the following information):

mehta-ankit commented 3 years ago

IIRC by default the chart installs cassandra statefulset using the upstream cassandra chart. Can you describe the statefulset and see what is it complaining about. The error message on the collector is about storage backend not being initialized which is cassandra in this case.

I would suggest start debugging Cassandra first and make sure its running.

image

greenbourne277 commented 2 years ago

I think that I face the same issue. Just followed the installation steps and cassandra is not coming up.

Here some output from kubernetes: $> kubernetes -n tracing describe pods jaeger-cassandra-0

Events:
  Type     Reason                 Age                From               Message
  ----     ------                 ----               ----               -------
  Normal   Scheduled              48s                default-scheduler  Successfully assigned default/jaeger-cassandra-0 to 192.168.221.55
  Normal   SuccessfulMountVolume  48s                kubelet            Successfully mounted volumes for pod "jaeger-cassandra-0_default(993ea5cc-6acb-4ccd-973c-9e2c1b349910)"
  Normal   Pulling                24s (x2 over 47s)  kubelet            Pulling image "cassandra:3.11.6"
  Warning  FailedPullImage        23s (x2 over 36s)  kubelet            Failed to pull image "cassandra:3.11.6": rpc error: code = Unknown desc = Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
  Warning  FailedCreate           23s (x2 over 36s)  kubelet            Error: ErrImagePull
  Warning  BackOffPullImage       10s (x2 over 35s)  kubelet            Back-off pulling image "cassandra:3.11.6"
  Warning  FailedCreate           10s (x2 over 35s)  kubelet            Error: ImagePullBackOff

cassandra:3.11.6 is the appVersion of the cassandra subchart. The subchart is part of the release but not the sources. Not sure if that is correct.

Here the relevant part of my Statefulset:

# Source: jaeger/charts/cassandra/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: jaeger-cassandra
  labels:
    app: cassandra
    chart: cassandra-0.15.3
    release: jaeger
    heritage: Helm
spec:
  selector:
    matchLabels:
      app: cassandra
      release: jaeger
  serviceName: jaeger-cassandra
  replicas: 3
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: OnDelete
  template:
    metadata:
      labels:
        app: cassandra
        release: jaeger
    spec:
      hostNetwork: false
      containers:
      - name: jaeger-cassandra
        image: "cassandra:3.11.6"
        imagePullPolicy: "IfNotPresent"
        resources:
          {}
        env:
        - name: CASSANDRA_SEEDS
          value: "jaeger-cassandra-0.jaeger-cassandra.tracing.svc.cluster.local"
        - name: MAX_HEAP_SIZE
          value: "2048M"
        - name: HEAP_NEWSIZE
          value: "512M"
        - name: CASSANDRA_ENDPOINT_SNITCH
          value: "GossipingPropertyFileSnitch"

Is the line image: "cassandra:3.11.6" correct? Shouldn't it use the 0.15.3 version instead? Any suggestions?

mehta-ankit commented 2 years ago

That's the image version 0.15.3 is chart version. https://github.com/helm/charts/blob/master/incubator/cassandra/values.yaml#L5

greenbourne277 commented 2 years ago

Sorry my problem is completely unrelated to this. I have an issue with docker hubs rate limiting.

mehta-ankit commented 2 years ago

@lkoniecz were you able to investigate your Cassandra pods to see what the issue is ? If its not an issue anymore can you please this issue.Thanks.

shawnli789 commented 2 years ago

Same issue here. Following the installation steps in minikube and here is what I got: image

Thoughts?

ChrisJBurns commented 2 years ago

I'm also getting the same. I install using the jaegertracing/jaeger chart with no values passed in, and I get issues with the collector and query not starting and failing to initialise backend

mcooknu commented 2 years ago

I see something similar but using ES... in an AWS EKS cluster... where the collector pod doesn't resolve dns queries (the backend elasticsearch for jaeger is an AWS URL.). Sometimes the query pod also has this problem.

The underlying node has dns, other pods on the same node have dns resolution.

{"level":"fatal","ts":1648827705.4414911,"caller":"./main.go:80","msg":"Failed to init storage factory","error":"failed to create primary Elasticsearch client: Head \"https://vpc-.us-east-1.es.amazonaws.com:443\": context deadline exceeded","stacktrace":"main.main.func1\n\t./main.go:80\ngithub.com/spf13/cobra.(Command).execute\n\tgithub.com/spf13/cobra@v1.2.1/command.go:856\ngithub.com/spf13/cobra.(Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.2.1/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.2.1/command.go:902\nmain.main\n\t./main.go:147\nruntime.main\n\truntime/proc.go:255"}

/ $ cat /etc/resolv.conf nameserver 172.20.0.10 search .svc.cluster.local svc.cluster.local cluster.local ec2.internal options ndots:5 / $

/ $ nslookup https://vpc-.us-east-1.es.amazonaws.com ;; connection timed out; no servers could be reached

I have tried using the ES IP Address and that works although the cert is rejected because IP address not URL is used. No other deployment is exhibiting this problem and sometimes (and I dont know why) Collector comes up fine and can reach the ES backend URL.