jaegertracing / jaeger-operator

Jaeger Operator for Kubernetes simplifies deploying and running Jaeger on Kubernetes.
https://www.jaegertracing.io/docs/latest/operator/
Apache License 2.0
1.01k stars 342 forks source link

[Bug]: Adding query.http.tls.enabled=true breaks jaeger-query #2104

Open grkooij opened 1 year ago

grkooij commented 1 year ago

What happened?

I want to setup Jaeger with TLS enabled for a secure connection between Grafana and Jaeger as data source. When adding the option query.http.tls.enabled=true to my deployment yaml, the jaeger-query pod does not start. Collector and agent do start with the same option set to true.

Setting option query.http.tls.enabled=false works as expected. The option should be supported (https://www.jaegertracing.io/docs/1.36/cli/). Im guessing it's a bug in Jaeger. However, some help or examples if I did something wrong would be appreciated (I haven't found any documentation or examples on how to work with Jaeger + TLS).

Changing the strategy to "all-in-one" does allow the pods to start up, but there is still no TLS.

Steps to reproduce

Deploy Jaeger operator on K8s cluster. Deploy Jaeger (see Deployment configs).

Expected behavior

Jaeger-query pod starts and has enabled TLS on the server.

Relevant log output

{"level":"info","ts":1666868672.982716,"caller":"app/server.go:273","msg":"Starting HTTP server","port":16686,"addr":":16686"}
{"level":"error","ts":1666868672.9827507,"caller":"app/server.go:284","msg":"Could not start HTTP server","error":"open : no such file or directory","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*Server).Start.func1\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/server.go:284"}

Screenshot

No response

Additional context

No response

Jaeger backend version

v1.36.0

SDK

No response

Pipeline

No response

Stogage backend

ElasticSearch 7

Operating system

No response

Deployment model

Kubernetes

Deployment configs

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata: 
  name: jaeger
  namespace: observability
spec:
  agent:
    strategy: Sidecar
    options:
      admin.http.tls.enabled: true
  query:
    options:
      query.http.tls.enabled: true
  collector:
    options:
      collector.http.tls.enabled: true
    maxReplicas: 5
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
  storage: 
    options: 
      es: 
        server-urls: https://elastic-es-http.elastic:9200
        tls:
          ca: /es/certificates/ca.crt
          enabled: true
          skip-host-verify: false
    secretName: jaeger-secret
    type: elasticsearch
  volumeMounts:      
      - mountPath: "/es/certificates/"
        name: certificates
        readOnly: true 
  volumes:
      - name: certificates
        secret:
          secretName: elastic-es-http-certs-public
          items: 
            - key: ca.crt
              path: ca.crt
  strategy: production
EKrol2 commented 1 year ago

Same issue here, very strange.

frzifus commented 1 year ago

Is the deployment created? If yes, could you share the event log?

grkooij commented 1 year ago

Deployment is created yes. kubectl events gives:

LAST SEEN   TYPE      REASON              OBJECT                                   MESSAGE
52s         Normal    Scheduled           pod/jaeger-collector-7b65ff566f-n5mn8    Successfully assigned observability/jaeger-collector-7b65ff566f-n5mn8 to ip-xxxxxxxxxxx.eu-central-1.compute.internal
50s         Normal    Pulling             pod/jaeger-collector-7b65ff566f-n5mn8    Pulling image "jaegertracing/jaeger-collector:1.35.2"
45s         Normal    Pulled              pod/jaeger-collector-7b65ff566f-n5mn8    Successfully pulled image "jaegertracing/jaeger-collector:1.35.2" in 4.268128423s
23s         Normal    Created             pod/jaeger-collector-7b65ff566f-n5mn8    Created container jaeger-collector
23s         Normal    Started             pod/jaeger-collector-7b65ff566f-n5mn8    Started container jaeger-collector
23s         Normal    Pulled              pod/jaeger-collector-7b65ff566f-n5mn8    Container image "jaegertracing/jaeger-collector:1.35.2" already present on machine
7s          Warning   BackOff             pod/jaeger-collector-7b65ff566f-n5mn8    Back-off restarting failed container
22s         Warning   Unhealthy           pod/jaeger-collector-7b65ff566f-n5mn8    Readiness probe failed: Get "http://[xxxxxxx]:14269/": dial tcp [xxxxxxx]:14269: connect: connection refused
52s         Normal    SuccessfulCreate    replicaset/jaeger-collector-7b65ff566f   Created pod: jaeger-collector-7b65ff566f-n5mn8
53s         Normal    ScalingReplicaSet   deployment/jaeger-collector              Scaled up replica set jaeger-collector-7b65ff566f to 1
52s         Normal    Scheduled           pod/jaeger-query-57f9948bcc-gzsc4        Successfully assigned observability/jaeger-query-57f9948bcc-gzsc4 to ip-xxxxxxxx.eu-central-1.compute.internal
50s         Normal    Pulling             pod/jaeger-query-57f9948bcc-gzsc4        Pulling image "jaegertracing/jaeger-query:1.35.2"
46s         Normal    Pulled              pod/jaeger-query-57f9948bcc-gzsc4        Successfully pulled image "jaegertracing/jaeger-query:1.35.2" in 4.45897922s
17s         Normal    Created             pod/jaeger-query-57f9948bcc-gzsc4        Created container jaeger-query
17s         Normal    Started             pod/jaeger-query-57f9948bcc-gzsc4        Started container jaeger-query
45s         Normal    Pulling             pod/jaeger-query-57f9948bcc-gzsc4        Pulling image "jaegertracing/jaeger-agent:1.35.2"
42s         Normal    Pulled              pod/jaeger-query-57f9948bcc-gzsc4        Successfully pulled image "jaegertracing/jaeger-agent:1.35.2" in 3.114319514s
42s         Normal    Created             pod/jaeger-query-57f9948bcc-gzsc4        Created container jaeger-agent
42s         Normal    Started             pod/jaeger-query-57f9948bcc-gzsc4        Started container jaeger-agent
17s         Normal    Pulled              pod/jaeger-query-57f9948bcc-gzsc4        Container image "jaegertracing/jaeger-query:1.35.2" already present on machine
7s          Warning   BackOff             pod/jaeger-query-57f9948bcc-gzsc4        Back-off restarting failed container
52s         Normal    SuccessfulCreate    replicaset/jaeger-query-57f9948bcc       Created pod: jaeger-query-57f9948bcc-gzsc4
53s         Normal    ScalingReplicaSet   deployment/jaeger-query                  Scaled up replica set jaeger-query-57f9948bcc to 1
frzifus commented 1 year ago

Oh seems like the pod is crashing. Could you also provide the logs from your pod?

grkooij commented 1 year ago

I've shared the relevant log output in the section Relevant log output in the original post. After that part the pod shuts down. Let me know if you need more logs than that and I'll dig them up

frzifus commented 1 year ago

Ahh i see. It says "Could not start HTTP server","error":"open : no such file or directory". Looks like you need to define the path to the certs too, right?

https://github.com/jaegertracing/jaeger/blob/v1.36.0/pkg/config/tlscfg/flags.go#L60-L69

grkooij commented 1 year ago

I did include a clientCA, and the error persisted. Are all the options listed in the url required to be set (e.g. tlsCert, tlsKey, tlsClientCA)?

I havent had time yet to test putting all options in, but will report here when I do.

What gets me confused is when I create the deployment with the all-in-one strategy, it does not show an error when I put query.http.tls.enabled: true, and the pod and http server start without issue.

grkooij commented 1 year ago

Adding tls.cert and tls.key worked, thanks. Though I am now running into https://github.com/jaegertracing/jaeger/issues/2976. Even with a clean Jaeger installation. And even when rolling back to using just http.

As for the TLS part, it would be useful to have a more descriptive error message other than "no such file or directory"