jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.53k stars 2.44k forks source link

all-in-one jaeger crashes with SIGSEGV #2359

Closed miyakz1192 closed 4 years ago

miyakz1192 commented 4 years ago

Describe the bug all-in-one jaeger crashes with SIGSEGV

To Reproduce Steps to reproduce the behavior: 1.set up ruby's gRPC environment 1-1) install ruby (my environment is "ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux-gnu]") 1-2) installing related gem(for gRPC)

       # gem install grpc
       # gem install grpc-tools

2.check jaeger's logfile execute in controller node.

  #kubectl logs istio-tracing-954c9b595-bpd9p -n istio-system -f

3.execute this code(do this procedure)

  $ git clone https://github.com/miyakz1192/jaeger-idl.git
  $ cd jaeger-idl
  $ git checkout support_ruby
  $ cd proto-gen-ruby/
  $ ruby -I . sample.rb

4.you will get panic error log like that

  panic: runtime error: invalid memory address or nil pointer dereference
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x114add8]

  goroutine 269 [running]:
  github.com/jaegertracing/jaeger/cmd/query/app.(*GRPCHandler).GetTrace(0xc005a6f860, 0xc006008600, 0x1d1e740, 0xc005fa76f0, 0xc005a6f860, 0x20)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/query/app/grpc_handler.go:60 +0x368
  github.com/jaegertracing/jaeger/proto-gen/api_v2._QueryService_GetTrace_Handler(0x13c3d80, 0xc005a6f860, 0x1d1bec0, 0xc005aaa240, 0x2848bc0, 0xc000380700)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/proto-gen/api_v2/query.pb.go:955 +0x109
  github.com/jaegertracing/jaeger/vendor/google.golang.org/grpc.(*Server).processStreamingRPC(0xc005a6af00, 0x1d22a00, 0xc005c30480, 0xc000380700, 0xc005a96300, 0x2811360, 0x0, 0x0, 0x0)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/google.golang.org/grpc/server.go:1211 +0xb22
  github.com/jaegertracing/jaeger/vendor/google.golang.org/grpc.(*Server).handleStream(0xc005a6af00, 0x1d22a00, 0xc005c30480, 0xc000380700, 0x0)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/google.golang.org/grpc/server.go:1291 +0xd30
  github.com/jaegertracing/jaeger/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc00032e384, 0xc005a6af00, 0x1d22a00, 0xc005c30480, 0xc000380700)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/google.golang.org/grpc/server.go:722 +0xbb
  created by github.com/jaegertracing/jaeger/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/google.golang.org/grpc/server.go:720 +0xa1

Expected behavior Jaeger will return trace data to ruby client.

Screenshots No screenshots, because of above reproduction procedure is.

Version (please complete the following information):

root@kubecon1:~/documents/jaeger-idl/proto-gen-ruby# kubectl get deployments.apps istio-tracing -o yaml -n istio-system 
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "3"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"jaeger","operator.istio.io/component":"AddonComponents","operator.istio.io/managed":"Reconcile","operator.istio.io/version":"1.5.2","release":"istio"},"name":"istio-tracing","namespace":"istio-system"},"spec":{"selector":{"matchLabels":{"app":"jaeger"}},"template":{"metadata":{"annotations":{"prometheus.io/port":"14269","prometheus.io/scrape":"true","sidecar.istio.io/inject":"false"},"labels":{"app":"jaeger","release":"istio"}},"spec":{"affinity":{"nodeAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"preference":{"matchExpressions":[{"key":"beta.kubernetes.io/arch","operator":"In","values":["amd64"]}]},"weight":2},{"preference":{"matchExpressions":[{"key":"beta.kubernetes.io/arch","operator":"In","values":["ppc64le"]}]},"weight":2},{"preference":{"matchExpressions":[{"key":"beta.kubernetes.io/arch","operator":"In","values":["s390x"]}]},"weight":2}],"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"beta.kubernetes.io/arch","operator":"In","values":["amd64","ppc64le","s390x"]}]}]}}},"containers":[{"env":[{"name":"POD_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"BADGER_EPHEMERAL","value":"false"},{"name":"SPAN_STORAGE_TYPE","value":"badger"},{"name":"BADGER_DIRECTORY_VALUE","value":"/badger/data"},{"name":"BADGER_DIRECTORY_KEY","value":"/badger/key"},{"name":"COLLECTOR_ZIPKIN_HTTP_PORT","value":"9411"},{"name":"MEMORY_MAX_TRACES","value":"50000"},{"name":"QUERY_BASE_PATH","value":"/jaeger"}],"image":"docker.io/jaegertracing/all-in-one:1.16","imagePullPolicy":"IfNotPresent","livenessProbe":{"httpGet":{"path":"/","port":14269}},"name":"jaeger","ports":[{"containerPort":9411},{"containerPort":16686},{"containerPort":14250},{"containerPort":14267},{"containerPort":14268},{"containerPort":14269},{"containerPort":5775,"protocol":"UDP"},{"containerPort":6831,"protocol":"UDP"},{"containerPort":6832,"protocol":"UDP"}],"readinessProbe":{"httpGet":{"path":"/","port":14269}},"resources":{"requests":{"cpu":"10m"}},"volumeMounts":[{"mountPath":"/badger","name":"data"}]}],"volumes":[{"emptyDir":{},"name":"data"}]}}}}
  creationTimestamp: "2020-05-10T08:05:27Z"
  generation: 3
  labels:
    app: jaeger
    operator.istio.io/component: AddonComponents
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.5.2
    release: istio
  name: istio-tracing
  namespace: istio-system
  resourceVersion: "12489320"
  selfLink: /apis/apps/v1/namespaces/istio-system/deployments/istio-tracing
  uid: ec652920-f710-44fa-9e0a-2d551fc46824
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: jaeger
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        prometheus.io/port: "14269"
        prometheus.io/scrape: "true"
        sidecar.istio.io/inject: "false"
      creationTimestamp: null
      labels:
        app: jaeger
        release: istio
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: beta.kubernetes.io/arch
                operator: In
                values:
                - amd64
            weight: 2
          - preference:
              matchExpressions:
              - key: beta.kubernetes.io/arch
                operator: In
                values:
                - ppc64le
            weight: 2
          - preference:
              matchExpressions:
              - key: beta.kubernetes.io/arch
                operator: In
                values:
                - s390x
            weight: 2
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: beta.kubernetes.io/arch
                operator: In
                values:
                - amd64
                - ppc64le
                - s390x
      containers:
      - env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: BADGER_EPHEMERAL
          value: "false"
        - name: SPAN_STORAGE_TYPE
          value: badger
        - name: BADGER_DIRECTORY_VALUE
          value: /badger/data
        - name: BADGER_DIRECTORY_KEY
          value: /badger/key
        - name: COLLECTOR_ZIPKIN_HTTP_PORT
          value: "9411"
        - name: MEMORY_MAX_TRACES
          value: "50000"
        - name: QUERY_BASE_PATH
          value: /jaeger
        - name: LOG_LEVEL
          value: debug
        image: docker.io/jaegertracing/all-in-one:latest
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 14269
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: jaeger
        ports:
        - containerPort: 9411
          protocol: TCP
        - containerPort: 16686
          protocol: TCP
        - containerPort: 14250
          protocol: TCP
        - containerPort: 14267
          protocol: TCP
        - containerPort: 14268
          protocol: TCP
        - containerPort: 14269
          protocol: TCP
        - containerPort: 5775
          protocol: UDP
        - containerPort: 6831
          protocol: UDP
        - containerPort: 6832
          protocol: UDP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 14269
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 10m
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /badger
          name: data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: data
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2020-05-10T08:05:28Z"
    lastUpdateTime: "2020-07-25T11:53:54Z"
    message: ReplicaSet "istio-tracing-954c9b595" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2020-07-26T01:53:51Z"
    lastUpdateTime: "2020-07-26T01:53:51Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 3
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1
root@kubecon1:~/documents/jaeger-idl/proto-gen-ruby# 
root@kubecon1:~/documents/jaeger-idl/proto-gen-ruby# kubectl get pods istio-tracing-954c9b595-bpd9p -n istio-system  -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    prometheus.io/port: "14269"
    prometheus.io/scrape: "true"
    sidecar.istio.io/inject: "false"
  creationTimestamp: "2020-07-25T11:53:17Z"
  generateName: istio-tracing-954c9b595-
  labels:
    app: jaeger
    pod-template-hash: 954c9b595
    release: istio
  name: istio-tracing-954c9b595-bpd9p
  namespace: istio-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: istio-tracing-954c9b595
    uid: c082684a-f77a-4a64-9f98-3d262f25c172
  resourceVersion: "12489310"
  selfLink: /api/v1/namespaces/istio-system/pods/istio-tracing-954c9b595-bpd9p
  uid: 6770e7ff-2869-4c3e-87e0-25eebd7baa13
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: beta.kubernetes.io/arch
            operator: In
            values:
            - amd64
        weight: 2
      - preference:
          matchExpressions:
          - key: beta.kubernetes.io/arch
            operator: In
            values:
            - ppc64le
        weight: 2
      - preference:
          matchExpressions:
          - key: beta.kubernetes.io/arch
            operator: In
            values:
            - s390x
        weight: 2
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: beta.kubernetes.io/arch
            operator: In
            values:
            - amd64
            - ppc64le
            - s390x
  containers:
  - env:
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: BADGER_EPHEMERAL
      value: "false"
    - name: SPAN_STORAGE_TYPE
      value: badger
    - name: BADGER_DIRECTORY_VALUE
      value: /badger/data
    - name: BADGER_DIRECTORY_KEY
      value: /badger/key
    - name: COLLECTOR_ZIPKIN_HTTP_PORT
      value: "9411"
    - name: MEMORY_MAX_TRACES
      value: "50000"
    - name: QUERY_BASE_PATH
      value: /jaeger
    - name: LOG_LEVEL
      value: debug
    image: docker.io/jaegertracing/all-in-one:latest
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: 14269
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: jaeger
    ports:
    - containerPort: 9411
      protocol: TCP
    - containerPort: 16686
      protocol: TCP
    - containerPort: 14250
      protocol: TCP
    - containerPort: 14267
      protocol: TCP
    - containerPort: 14268
      protocol: TCP
    - containerPort: 14269
      protocol: TCP
    - containerPort: 5775
      protocol: UDP
    - containerPort: 6831
      protocol: UDP
    - containerPort: 6832
      protocol: UDP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: 14269
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      requests:
        cpu: 10m
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /badger
      name: data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-87tws
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kubenode11
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: data
  - name: default-token-87tws
    secret:
      defaultMode: 420
      secretName: default-token-87tws
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-07-25T11:53:17Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-07-26T01:53:51Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-07-26T01:53:51Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-07-25T11:53:17Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://efe61bba25986afaa0160ef5288880d5b701f20fcc6ba80395d4723fc06f5446
    image: jaegertracing/all-in-one:latest
    imageID: docker-pullable://jaegertracing/all-in-one@sha256:13646d75e8c33ab1c9e12a5c7269e3fe4f7b122a9f79d82d2b6f9cf25aad2d98
    lastState:
      terminated:
        containerID: docker://abf1ddacff4e7f22c90d6dba61bd3998baa9aa0a053cae8c97bf5162007fd1ec
        exitCode: 2
        finishedAt: "2020-07-26T01:53:36Z"
        reason: Error
        startedAt: "2020-07-25T11:59:10Z"
    name: jaeger
    ready: true
    restartCount: 2
    started: true
    state:
      running:
        startedAt: "2020-07-26T01:53:44Z"
  hostIP: 192.168.122.3
  phase: Running
  podIP: 10.244.1.77
  podIPs:
  - ip: 10.244.1.77
  qosClass: Burstable
  startTime: "2020-07-25T11:53:17Z"
root@kubecon1:~/documents/jaeger-idl/proto-gen-ruby# 

What troubleshooting steps did you try? Try to follow https://www.jaegertracing.io/docs/latest/troubleshooting/ and describe how far you were able to progress and/or which steps did not work. I read those document but I couldn't found solution.

jpkrohling commented 4 years ago

I believe this might have been fixed already. Here's what I see when running Jaeger from the current tip of the master branch:

Your reproducer:

$ ruby -I . sample.rb
Traceback (most recent call last):
    7: from sample.rb:13:in `<main>'
    6: from sample.rb:10:in `test'
    5: from sample.rb:10:in `map'
    4: from sample.rb:10:in `each'
    3: from /usr/local/share/gems/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:337:in `each_remote_read_then_finish'
    2: from /usr/local/share/gems/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:169:in `receive_and_check_status'
    1: from /usr/local/share/gems/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:180:in `attach_status_results_and_complete_call'
/usr/local/share/gems/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:29:in `check_status': 5:trace not found: trace not found. debug_error_string:{"created":"@1595834670.748965007","description":"Error received from peer ipv6:[::1]:16686","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"trace not found: trace not found","grpc_status":5} (GRPC::NotFound)

Jaeger Query:

{"level":"error","ts":1595834670.7487974,"caller":"app/grpc_handler.go:59","msg":"trace not found","error":"trace not found","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*GRPCHandler).GetTrace\n\t/home/jpkroehling/Projects/src/github.com/jaegertracing/jaeger/cmd/query/app/grpc_handler.go:59\ngithub.com/jaegertracing/jaeger/proto-gen/api_v2._QueryService_GetTrace_Handler\n\t/home/jpkroehling/Projects/src/github.com/jaegertracing/jaeger/proto-gen/api_v2/query.pb.go:955\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\t/home/jpkroehling/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:1237\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/home/jpkroehling/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:1317\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/home/jpkroehling/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:722"}

My Ruby versions, installed fresh on an up to date Fedora 32:

$ ruby -v
 ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]

In fact, I wasn't able to reproduce it even on an ancient Jaeger version (1.12), which makes me wonder the actual version you are using. Would you be able to get the container SHA that you are running? Or perhaps could you please try setting a fixed, recent version (1.18.1, for instance)?

jpkrohling commented 4 years ago

@jkandasa just pointed out that there's a SHA in your original report, so I went ahead and tried to reproduce it without success:

$ podman run -p 16686 jaegertracing/all-in-one@sha256:13646d75e8c33ab1c9e12a5c7269e3fe4f7b122a9f79d82d2b6f9cf25aad2d98
{"level":"info","ts":1595835573.0093844,"caller":"flags/service.go:116","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1595835573.0098934,"caller":"flags/admin.go:120","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1595835573.0099263,"caller":"flags/admin.go:126","msg":"Starting admin HTTP server","http-addr":":14269"}
{"level":"info","ts":1595835573.0099432,"caller":"flags/admin.go:112","msg":"Admin server started","http.host-port":"[::]:14269","health-status":"unavailable"}
{"level":"info","ts":1595835573.0429375,"caller":"memory/factory.go:61","msg":"Memory storage initialized","configuration":{"MaxTraces":0}}
{"level":"info","ts":1595835573.05047,"caller":"server/grpc.go:76","msg":"Starting jaeger-collector gRPC server","grpc.host-port":":14250"}
{"level":"info","ts":1595835573.0504975,"caller":"server/http.go:44","msg":"Starting jaeger-collector HTTP server","http host-port":":14268"}
{"level":"info","ts":1595835573.0506194,"caller":"server/zipkin.go:48","msg":"Listening for Zipkin HTTP traffic","zipkin host-port":":0"}
{"level":"info","ts":1595835573.0677428,"caller":"grpc/builder.go:66","msg":"Agent requested insecure grpc connection to collector(s)"}
{"level":"info","ts":1595835573.067788,"caller":"grpc@v1.27.1/clientconn.go:106","msg":"parsed scheme: \"\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1595835573.068015,"caller":"grpc@v1.27.1/clientconn.go:106","msg":"scheme \"\" not registered, fallback to default scheme","system":"grpc","grpc_log":true}
{"level":"info","ts":1595835573.0680413,"caller":"passthrough/passthrough.go:48","msg":"ccResolverWrapper: sending update to cc: {[{:14250  <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1595835573.06813,"caller":"grpc@v1.27.1/clientconn.go:948","msg":"ClientConn switching balancer to \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1595835573.0688906,"caller":"command-line-arguments/main.go:206","msg":"Starting agent"}
{"level":"info","ts":1595835573.0688972,"caller":"base/balancer.go:196","msg":"roundrobinPicker: newPicker called with info: {map[0xc00020bc30:{{:14250  <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1595835573.0689435,"caller":"querysvc/query_service.go:137","msg":"Archive storage not created","reason":"archive storage not supported"}
{"level":"info","ts":1595835573.0689585,"caller":"app/flags.go:113","msg":"Archive storage not initialized"}
{"level":"info","ts":1595835573.0689614,"caller":"app/agent.go:69","msg":"Starting jaeger-agent HTTP server","http-port":5778}
{"level":"info","ts":1595835573.0693355,"caller":"app/server.go:133","msg":"Query server started","port":16686,"addr":":16686"}
{"level":"info","ts":1595835573.06936,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1595835573.0693824,"caller":"app/server.go:171","msg":"Starting CMUX server","port":16686,"addr":":16686"}
{"level":"info","ts":1595835573.069387,"caller":"app/server.go:161","msg":"Starting GRPC server","port":16686,"addr":":16686"}
{"level":"info","ts":1595835573.0693805,"caller":"app/server.go:148","msg":"Starting HTTP server","port":16686,"addr":":16686"}
{"level":"error","ts":1595835575.6054332,"caller":"app/grpc_handler.go:59","msg":"trace not found","error":"trace not found","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*GRPCHandler).GetTrace\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/grpc_handler.go:59\ngithub.com/jaegertracing/jaeger/proto-gen/api_v2._QueryService_GetTrace_Handler\n\tgithub.com/jaegertracing/jaeger/proto-gen/api_v2/query.pb.go:955\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgoogle.golang.org/grpc@v1.27.1/server.go:1237\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/grpc@v1.27.1/server.go:1317\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\tgoogle.golang.org/grpc@v1.27.1/server.go:722"}
miyakz1192 commented 4 years ago

Thank you for your reply and sorry for my late reply. I tried image(sha256:13646d75e8c33ab1c9e12a5c7269e3fe4f7b122a9f79d82d2b6f9cf25aad2d98) but situation was not changed.

[jaeger side logs]

  panic: runtime error: invalid memory address or nil pointer dereference
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x10e7586]

  goroutine 288 [running]:
  github.com/jaegertracing/jaeger/cmd/query/app.(*GRPCHandler).GetTrace(0xc005fb87e0, 0xc005abab10, 0x1c94a80, 0xc000118b60, 0xc005fb87e0, 0x20)
        github.com/jaegertracing/jaeger/cmd/query/app/grpc_handler.go:66 +0x3d6
  github.com/jaegertracing/jaeger/proto-gen/api_v2._QueryService_GetTrace_Handler(0x1336600, 0xc005fb87e0, 0x1c91c60, 0xc000370240, 0x26995e8, 0xc0002
5cf00)
        github.com/jaegertracing/jaeger/proto-gen/api_v2/query.pb.go:955 +0x109
  google.golang.org/grpc.(*Server).processStreamingRPC(0xc005fb2300, 0x1c988c0, 0xc0006dd080, 0xc00025cf00, 0xc005fbacc0, 0x2653f20, 0x0, 0x0, 0x0)
        google.golang.org/grpc@v1.27.1/server.go:1237 +0xcd1
  google.golang.org/grpc.(*Server).handleStream(0xc005fb2300, 0x1c988c0, 0xc0006dd080, 0xc00025cf00, 0x0)
        google.golang.org/grpc@v1.27.1/server.go:1317 +0xcd6
  google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0002880c0, 0xc005fb2300, 0x1c988c0, 0xc0006dd080, 0xc00025cf00)
        google.golang.org/grpc@v1.27.1/server.go:722 +0xa1
  created by google.golang.org/grpc.(*Server).serveStreams.func1
        google.golang.org/grpc@v1.27.1/server.go:720 +0xa1
  root@kubecon1:/home/miyakz# 

[client side output]

  root@eng-app-app-57cf6f7bb9-2tq66:/jaeger-idl/proto-gen-ruby# ruby -I . sample.rb
  Traceback (most recent call last):
        7: from sample.rb:13:in `<main>'
        6: from sample.rb:10:in `test'
        5: from sample.rb:10:in `map'
        4: from sample.rb:10:in `each'
        3: from /var/lib/gems/2.5.0/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:337:in `each_remote_read_then_finish'
        2: from /var/lib/gems/2.5.0/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:169:in `receive_and_check_status'
        1: from /var/lib/gems/2.5.0/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:180:in `attach_status_results_and_complete_call'
  /var/lib/gems/2.5.0/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:29:in `check_status': 14:Socket closed. debug_error_string:{"created":"@1597501484.472667844","description":"Error received from peer ipv4:192.168.100.2:30858","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Socket closed","grpc_status":14} (GRPC::Unavailable)
  root@eng-app-app-57cf6f7bb9-2tq66:/jaeger-idl/proto-gen-ruby# 

I have two worker nodes.

  root@kubecon1:/home/miyakz# kubectl get nodes
  NAME         STATUS   ROLES    AGE    VERSION
  kubecon1     Ready    master   308d   v1.16.1
  kubenode11   Ready    <none>   308d   v1.16.1
  root@kubecon1:/home/miyakz# 

And jaeger images on each worker nodes are following.

  root@kubecon1:/home/miyakz# docker image list | grep jaeger
  jaegertracing/protobuf                      0.2.0                          8817b24a4c9e        7 weeks ago         153MB
  jaegertracing/thrift                        0.13                           46013df3cf70        3 months ago        126MB
  root@kubecon1:/home/miyakz# 

  root@kubenode11:/home/miyakz# docker image list | grep jaeger
  jaegertracing/all-in-one                    <none>                         5ef5c2d8e8d1        5 weeks ago         47.4MB
  root@kubenode11:/home/miyakz# 

  root@kubenode11:/home/miyakz# docker inspect --format='{{.RepoDigests}}'  5ef5c2d8e8d1
  [jaegertracing/all-in-one@sha256:13646d75e8c33ab1c9e12a5c7269e3fe4f7b122a9f79d82d2b6f9cf25aad2d98]
  root@kubenode11:/home/miyakz# 

Needless to say, jaeger tracing container runs on kubenode11.

istio-system   istio-tracing-6dd48c455-2snw5          1/1     Running   1          19m     10.244.1.119    kubenode11   <none>           <none>

My Intuition is CPU misconfiguration or bug. My CPU environment is the following.

root@kubenode11:/home/miyakz# lscpu 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       40 bits physical, 48 bits virtual
CPU(s):              7
On-line CPU(s) list: 0-6
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           7
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          6
Model:               6
Model name:          QEMU Virtual CPU version 2.5+
Stepping:            3
CPU MHz:             3593.254
BogoMIPS:            7186.50
Virtualization:      AMD-V
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0-6
Flags:               fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm nopl cpuid pni cx16 hypervisor lahf_lm svm 3dnowprefetch vmmcall
root@kubenode11:/home/miyakz# 

Could you any comments?

miyakz1192 commented 4 years ago

Additional information.

I found that jaeger-all-in-one works on VM doesn't crash.

I pulled jaeger source from github.

root@dockerbuild:/home/miyakz/jaeger/cmd/all-in-one# git remote -v
origin  https://github.com/jaegertracing/jaeger.git (fetch)
origin  https://github.com/jaegertracing/jaeger.git (push)
root@dockerbuild:/home/miyakz/jaeger/cmd/all-in-one# git log | head
commit 5f4fc8da4d4e5d80cff7f8e06bad9c8e271abcf1
Author: Sreevani K V S <sreevani@freshdesk.com>
Date:   Fri Aug 14 22:12:18 2020 +0530

    Fix panic in collector when Zipkin server is shutdown (#2392)

    Signed-off-by: Sreevani871 <sreevani@freshdesk.com>

commit 5daae75911e36b842b45279d8a599bcc5d56eabf
Author: rjs211 <srivatsa211@gmail.com>
root@dockerbuild:/home/miyakz/jaeger/cmd/all-in-one# 

and I built all-in-one binary and try to reproduce the problem but the problem didn't appear.

[server side log]

{"level":"error","ts":1597551056.6664784,"caller":"app/grpc_handler.go:59","msg":"trace not found","error":"trace not found","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*GRPCHandler).GetTrace\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/grpc_handler.go:59\ngithub.com/jaegertracing/jaeger/proto-gen/api_v2._QueryService_GetTrace_Handler\n\tgithub.com/jaegertracing/jaeger/proto-gen/api_v2/query.pb.go:955\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgoogle.golang.org/grpc@v1.31.0/server.go:1427\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/grpc@v1.31.0/server.go:1507\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\tgoogle.golang.org/grpc@v1.31.0/server.go:843"}

[client side log]

root@dockerbuild:/home/miyakz/jaeger-idl/proto-gen-ruby# ruby -I . sample.rb
Traceback (most recent call last):
        7: from sample.rb:15:in `<main>'
        6: from sample.rb:12:in `test'
        5: from sample.rb:12:in `map'
        4: from sample.rb:12:in `each'
        3: from /var/lib/gems/2.5.0/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:337:in `each_remote_read_then_finish'
        2: from /var/lib/gems/2.5.0/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:169:in `receive_and_check_status'
        1: from /var/lib/gems/2.5.0/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:180:in `attach_status_results_and_complete_call'
/var/lib/gems/2.5.0/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:29:in `check_status': 5:trace not found: trace not found. debug_error_string:{"created":"@1597551056.685024015","description":"Error received from peer ipv4:127.0.0.1:16686","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"trace not found: trace not found","grpc_status":5} (GRPC::NotFound)
root@dockerbuild:/home/miyakz/jaeger-idl/proto-gen-ruby# 

I think there is something wrong with container environment. anyone knows a similar problem?

miyakz1192 commented 4 years ago

Additional information.

Just in case, I created jaeger all-in-one with scripts/travis/build-all-in-one-image.sh and push to my dockerhub repo(https://hub.docker.com/repository/docker/miyakz1192/jaeger-all-in-one)

the container created from that image crashed with SIGSEGV.

[note1] when I exec build-all-in-one-image.sh, I set BRANCH environment valuable with "master".

[note2] I run cmd/all-in-one/all-in-one-linux-amd64 in VM, this program(process) didn't crash.

jpkrohling commented 4 years ago

Would you have detailed instructions on how to reproduce this? Based on your last messages, I assumed that it would happen with Jaeger deployed on Kubernetes, so, I started minikube, deployed the Jaeger Operator, deployed a Jaeger instance, did a port-forward for the 16686 port, but still couldn't hit the problem:

$ minikube start --vm-driver kvm2 --cpus 6 --memory 20480 --container-runtime=crio

$ kubectl create namespace observability
$ kubectl apply -f deploy/crds/jaegertracing.io_jaegers_crd.yaml
$ kubectl apply -f deploy/service_account.yaml
$ kubectl apply -f deploy/role.yaml
$ kubectl apply -f deploy/role_binding.yaml
$ kubectl apply -f deploy/operator.yaml

$ kubectl apply -f deploy/examples/simplest.yaml 
$ kubectl port-forward service/simplest-query 16686

Your reproducer:

$ ruby -I . sample.rb
Traceback (most recent call last):
    7: from sample.rb:13:in `<main>'
    6: from sample.rb:10:in `test'
    5: from sample.rb:10:in `map'
    4: from sample.rb:10:in `each'
    3: from /home/jpkroehling/.gem/ruby/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:337:in `each_remote_read_then_finish'
    2: from /home/jpkroehling/.gem/ruby/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:169:in `receive_and_check_status'
    1: from /home/jpkroehling/.gem/ruby/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:180:in `attach_status_results_and_complete_call'
/home/jpkroehling/.gem/ruby/gems/grpc-1.30.2-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:29:in `check_status': 5:trace not found: trace not found. debug_error_string:{"created":"@1597658073.663623997","description":"Error received from peer ipv6:[::1]:16686","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"trace not found: trace not found","grpc_status":5} (GRPC::NotFound)

The Jaeger logs:

{"level":"error","ts":1597658068.1243339,"caller":"app/grpc_handler.go:59","msg":"trace not found","error":"trace not found","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*GRPCHandler).GetTrace\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/grpc_handler.go:59\ngithub.com/jaegertracing/jaeger/proto-gen/api_v2._QueryService_GetTrace_Handler\n\tgithub.com/jaegertracing/jaeger/proto-gen/api_v2/query.pb.go:955\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgoogle.golang.org/grpc@v1.27.1/server.go:1237\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/grpc@v1.27.1/server.go:1317\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\tgoogle.golang.org/grpc@v1.27.1/server.go:722"}
miyakz1192 commented 4 years ago

hi. thank you for your reply.

My vmhost has not been configured with KVM enabled, I enabled that. But after this, any Virtual Machines would not boot. Maybe my environment was destroyed. So, I will rebuild my vmhost with 20.04 LTS and report later.

jpkrohling commented 4 years ago

Do let us know. At this point, this is a mystery to me :-)

miyakz1192 commented 4 years ago

Thank you for your response. I try to re-install ubuntu as 20.04LTS and ran minikube with your instruction.

I didn't encounter odd symptoms.

miyakz@lily2:~/source/jaeger-idl/proto-gen-ruby$ ruby -I . sample.rb
Traceback (most recent call last):
    7: from sample.rb:13:in `<main>'
    6: from sample.rb:10:in `test'
    5: from sample.rb:10:in `map'
    4: from sample.rb:10:in `each'
    3: from /var/lib/gems/2.7.0/gems/grpc-1.31.1-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:337:in `each_remote_read_then_finish'
    2: from /var/lib/gems/2.7.0/gems/grpc-1.31.1-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:169:in `receive_and_check_status'
    1: from /var/lib/gems/2.7.0/gems/grpc-1.31.1-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:180:in `attach_status_results_and_complete_call'
/var/lib/gems/2.7.0/gems/grpc-1.31.1-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:29:in `check_status': 5:trace not found: trace not found. debug_error_string:{"created":"@1598681485.188116118","description":"Error received from peer ipv4:127.0.0.1:16686","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"trace not found: trace not found","grpc_status":5} (GRPC::NotFound)
miyakz@lily2:~/source/jaeger-idl/proto-gen-ruby$ 

I think my environment had something wrong with KVM/qemu. After re-install it seems to everything fine.

yurishkuro commented 4 years ago

Feel free to reopen if needed.