grafana / agent

Vendor-neutral programmable observability pipelines.
https://grafana.com/docs/agent/
Apache License 2.0
1.6k stars 489 forks source link

Tracing agent looking for OSS Tempo Port -throwing error "missing port in address" #654

Closed cloudcafetech closed 3 years ago

cloudcafetech commented 3 years ago

My OSS Tempo running in different cluster (kube-central) and created tempo ingress (tempo.172.31.25.28.nip.io) pointing service tempo distributor (tempo-distributor) on port 3100

[root@ip-172-31-25-28 tracing]# oc get po,svc,ing
NAME                                        READY   STATUS    RESTARTS   AGE
pod/tempo-compactor-6dd48c5c85-mhr4w        1/1     Running   0          86m
pod/tempo-distributor-64644ff977-m9kvb      1/1     Running   0          86m
pod/tempo-ingester-0                        1/1     Running   0          86m
pod/tempo-memcached-0                       1/1     Running   0          86m
pod/tempo-querier-596687cb86-tbmkf          1/1     Running   0          86m
pod/tempo-query-frontend-5c6d657c9c-8xgpf   2/2     Running   0          86m

NAME                                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                   AGE
service/tempo-compactor                  ClusterIP   10.96.103.230   <none>        3100/TCP                                                                                                  86m
service/tempo-distributor                ClusterIP   10.96.210.16    <none>        3100/TCP,9095/TCP,6831/UDP,6832/UDP,14268/TCP,14250/TCP,9411/TCP,55681/TCP,4317/TCP,55680/TCP,55678/TCP   86m
service/tempo-gossip-ring                ClusterIP   10.96.185.155   <none>        7946/TCP                                                                                                  86m
service/tempo-ingester                   ClusterIP   10.96.19.146    <none>        3100/TCP,9095/TCP                                                                                         86m
service/tempo-memcached                  ClusterIP   10.96.217.36    <none>        11211/TCP,9150/TCP                                                                                        86m
service/tempo-querier                    ClusterIP   10.96.58.76     <none>        3100/TCP,9095/TCP                                                                                         86m
service/tempo-query-frontend             ClusterIP   10.96.238.159   <none>        3100/TCP,9095/TCP,16686/TCP,16687/TCP                                                                     86m
service/tempo-query-frontend-discovery   ClusterIP   None            <none>        3100/TCP,9095/TCP,16686/TCP,16687/TCP                                                                     86m

NAME                              CLASS    HOSTS                       ADDRESS     PORTS   AGE
ingress.networking.k8s.io/tempo   <none>   tempo.172.31.25.28.nip.io   localhost   80      86m

Now on different host (kube-one) running tracing agent and getting below error .. Due to that no data found in s3 (minio) bucket.

ts=2021-06-15T00:34:57Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: address tempo.172.31.25.28.nip.io: missing port in address\""
ts=2021-06-15T00:35:07Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: address tempo.172.31.25.28.nip.io: missing port in address\""
apiVersion: v1
kind: ServiceAccount
metadata:
  name: grafana-agent-traces
  namespace: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-agent-traces
  namespace: monitoring
data:
  agent.yaml: |
    server:
        http_listen_port: 8080
        log_level: info
    tempo:
        configs:
          - batch:
                send_batch_size: 1000
                timeout: 5s
            name: default
            receivers:
                jaeger:
                    protocols:
                        grpc: null
                        thrift_binary: null
                        thrift_compact: null
                        thrift_http: null
                    remote_sampling:
                        insecure: true
                        strategy_file: /etc/agent/strategies.json
                opencensus: null
                otlp:
                    protocols:
                        grpc: null
                        http: null
                zipkin: null
            attributes:
              actions:
              - action: upsert
                key: cluster
                value: kube-one
            remote_write:
              - endpoint: tempo.172.31.14.138.nip.io
                insecure: true
                #basic_auth:
                    #password: ${TEMPO_PASSWORD}
                    #username: ${TEMPO_USERNAME}
                retry_on_failure:
                    enabled: false
            scrape_configs:
              - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                job_name: kubernetes-pods
                kubernetes_sd_configs:
                  - role: pod
                relabel_configs:
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_namespace
                    target_label: namespace
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_name
                    target_label: pod
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_container_name
                    target_label: container
                tls_config:
                    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                    insecure_skip_verify: false
  strategies.json: '{"default_strategy": {"param": 0.001, "type": "probabilistic"}}'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: grafana-agent-traces
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-agent-traces
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-agent-traces
subjects:
- kind: ServiceAccount
  name: grafana-agent-traces
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: grafana-agent-traces
  name: grafana-agent-traces
  namespace: monitoring
spec:
  ports:
  - name: agent-http-metrics
    port: 8080
    targetPort: 8080
  - name: agent-thrift-compact
    port: 6831
    protocol: UDP
    targetPort: 6831
  - name: agent-thrift-binary
    port: 6832
    protocol: UDP
    targetPort: 6832
  - name: agent-thrift-http
    port: 14268
    protocol: TCP
    targetPort: 14268
  - name: agent-thrift-grpc
    port: 14250
    protocol: TCP
    targetPort: 14250
  - name: agent-zipkin
    port: 9411
    protocol: TCP
    targetPort: 9411
  - name: agent-otlp
    port: 55680
    protocol: TCP
    targetPort: 55680
  - name: agent-opencensus
    port: 55678
    protocol: TCP
    targetPort: 55678
  selector:
    name: grafana-agent-traces
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: grafana-agent-traces
  namespace: monitoring
spec:
  minReadySeconds: 10
  selector:
    matchLabels:
      name: grafana-agent-traces
  template:
    metadata:
      labels:
        name: grafana-agent-traces
    spec:
      containers:
      - args:
        - -config.file=/etc/agent/agent.yaml
        command:
        - /bin/agent
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        image: grafana/agent:v0.15.0
        imagePullPolicy: IfNotPresent
        name: agent
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 6831
          name: thrift-compact
          protocol: UDP
        - containerPort: 6832
          name: thrift-binary
          protocol: UDP
        - containerPort: 14268
          name: thrift-http
          protocol: TCP
        - containerPort: 14250
          name: thrift-grpc
          protocol: TCP
        - containerPort: 9411
          name: zipkin
          protocol: TCP
        - containerPort: 55680
          name: otlp
          protocol: TCP
        - containerPort: 55678
          name: opencensus
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/agent
          name: grafana-agent-traces
      serviceAccount: grafana-agent-traces
      tolerations:
      - effect: NoSchedule
        operator: Exists
      volumes:
      - configMap:
          name: grafana-agent-traces
        name: grafana-agent-traces
  updateStrategy:
    type: RollingUpdate
mapno commented 3 years ago

Hi! If I'm not mistaken, your ingress is only allowing traffic through port 3100. The Grafana Agent exports traces via gRPC using OTLP, which default port is 55680. Can you try opening that port for ingress too?

cloudcafetech commented 3 years ago

I am not too much expert in ingress. Not sure can setup multiple port in single ingress.

PLEASE HELP.🙏

cloudcafetech commented 3 years ago

You mean to say, in ingress if I use 55680 instead of 3100 .. it will work?

mapno commented 3 years ago

Not sure can setup multiple port in single ingress

The ingress spec does not allow multiple backends in the same path. I don't think that's possible. You would need to set up different paths or use a load balancer I guess.

You mean to say, in ingress if I use 55680 instead of 3100 .. it will work?

I think that should work, yea. Although you won't be able to access the distributors on 3100.

cloudcafetech commented 3 years ago

:( not working .. same error ...

ts=2021-06-15T10:24:31Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: address grafanaclient.172.31.25.50.nip.io: missing port in address\""
ts=2021-06-15T10:24:41Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: address grafanaclient.172.31.25.50.nip.io: missing port in address\""
ts=2021-06-15T10:24:51Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: address grafanaclient.172.31.25.50.nip.io: missing port in address\""
oc get ing grafanaclient -n tracing -o yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
  creationTimestamp: "2021-06-15T10:07:24Z"
  generation: 1
  name: grafanaclient
  namespace: tracing
  resourceVersion: "1866"
  uid: 57b33094-b19c-4c73-a203-3b98bfd3e4c9
spec:
  rules:
  - host: grafanaclient.172.31.25.50.nip.io
    http:
      paths:
      - backend:
          service:
            name: tempo-distributor
            port:
              number: 55680
        path: /
        pathType: Prefix
status:
  loadBalancer:
    ingress:
    - hostname: localhost
mapno commented 3 years ago

Sorry, it was my bad. I did not see the entire error message.

Error while dialing dial tcp: address grafanaclient.172.31.25.50.nip.io: missing port in address

You need to add :55680 to the remote_write endpoint:

            remote_write:
              - endpoint: tempo.172.31.14.138.nip.io:55680

Still, I think the ingress change was necessary. Let me know if it works now.

cloudcafetech commented 3 years ago

Tried both grafanaclient.172.31.25.50.nip.io:80 & grafanaclient.172.31.25.50.nip.io:55680 but not working.

new error ...

ts=2021-06-15T10:52:02Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.31.25.50:55680: i/o timeout\""
ts=2021-06-15T10:52:12Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.31.25.50:55680: i/o timeout\""
ts=2021-06-15T10:52:22Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.31.25.50:55680: i/o timeout\""
ts=2021-06-15T10:52:37Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
ts=2021-06-15T10:52:47Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
ts=2021-06-15T10:52:57Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
ts=2021-06-15T10:53:07Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
ts=2021-06-15T10:53:17Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
cloudcafetech commented 3 years ago

with below config ...

            remote_write:
              - endpoint: grafanaclient.172.31.25.50.nip.io:80
                insecure: true
                #basic_auth:
                    #password: ${TEMPO_PASSWORD}
                    #username: ${TEMPO_USERNAME}
                retry_on_failure:
                    enabled: false

getting below error ...

ts=2021-06-15T10:57:40Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed"
ts=2021-06-15T10:57:50Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed"
ts=2021-06-15T10:58:00Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed"
ts=2021-06-15T10:58:10Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed"
ts=2021-06-15T10:58:20Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed"
mapno commented 3 years ago

It should be pointing to grafanaclient.172.31.25.50.nip.io:55680. Can you make sure that grafanaclient.172.31.25.50.nip.io:55680 is reachable from one cluster to the other?

cloudcafetech commented 3 years ago

I don't think issue is related with reachability because it's using ingress. Other ingress (cortex & Loki) working perfectly.

Secondly if I mention port 55680 I got timeout error thought to me it' not valid port (55680) after endpoint because it's already take care by service with ingress. But by using 80 I got deadline exceeded error.

Please check below ..

ts=2021-06-15T10:52:02Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.31.25.50:55680: i/o timeout\""
ts=2021-06-15T10:52:12Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.31.25.50:55680: i/o timeout\""
ts=2021-06-15T10:52:22Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.31.25.50:55680: i/o timeout\""
ts=2021-06-15T10:52:37Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
ts=2021-06-15T10:52:47Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
ts=2021-06-15T10:52:57Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
ts=2021-06-15T10:53:07Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
ts=2021-06-15T10:53:17Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
cloudcafetech commented 3 years ago

I setup ingress with GRPC enable on port 443 (Ingress GRPC does not support on 80) then tested with following scenario but NO luck

To me issue in OLTP exporter code, Does it support any other port?

Have you tested using my scenario (using ingress)? Can you simulate at your end PLEASE.

rfratto commented 3 years ago

It sounds like there's been a lot of changes to your config since the issue has been opened.

Can you share the latest for all of the following:

Also note that the NGINX ingress controller does not enable HTTP/2 traffic by default. If you're using an OTLP receiver in Tempo, you'll need an extra annotation on your ingress to enable gRPC.

cloudcafetech commented 3 years ago
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tempo
---
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/instance: tempo
    app.kubernetes.io/name: tempo
  name: tempo
data:
  overrides.yaml: |
    overrides:
  tempo.yaml: |
    auth_enabled: false
    compactor:
      compaction:
        compacted_block_retention: 48h
    distributor:
      receivers:
        jaeger:
          protocols:
            thrift_compact:
              endpoint: 0.0.0.0:6831
            thrift_binary:
              endpoint: 0.0.0.0:6832
            thrift_http:
              endpoint: 0.0.0.0:14268
            grpc:
              endpoint: 0.0.0.0:14250
        zipkin:
          endpoint: 0.0.0.0:9411
        otlp:
          protocols:
            http:
              endpoint: 0.0.0.0:55681
            grpc:
              endpoint: 0.0.0.0:4317
        opencensus:
          endpoint: 0.0.0.0:55678 
    ingester:
      {}
    server:
      http_listen_port: 3100
    storage:
      trace:
        backend: s3
        local:
          path: /var/tempo/traces
        s3:
          access_key: admin
          bucket: tracing
          endpoint: 172.31.44.216:9000
          insecure: true
          secret_key: admin2675
        wal:
          path: /var/tempo/wal
---
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/instance: tempo
    app.kubernetes.io/name: tempo
  name: tempo-query
data:
  tempo-query.yaml: |
    backend: tempo:3100
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/instance: tempo
    app.kubernetes.io/name: tempo
  name: tempo
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: tempo
      app.kubernetes.io/name: tempo
  serviceName: tempo-headless
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: tempo
        app.kubernetes.io/name: tempo
    spec:
      containers:
      - args:
        - -config.file=/conf/tempo.yaml
        - -mem-ballast-size-mbs=1024
        image: grafana/tempo:1.0.0
        imagePullPolicy: IfNotPresent
        name: tempo
        ports:
        - containerPort: 3100
          name: prom-metrics
          protocol: TCP
        - containerPort: 6831
          name: jaeger-thrift-c
          protocol: UDP
        - containerPort: 6832
          name: jaeger-thrift-b
          protocol: UDP
        - containerPort: 14268
          name: jaeger-thrift-h
          protocol: TCP
        - containerPort: 14250
          name: jaeger-grpc
          protocol: TCP
        - containerPort: 9411
          name: zipkin
          protocol: TCP
        - containerPort: 55680
          name: otlp-legacy
          protocol: TCP
        - containerPort: 4317
          name: otlp-grpc
          protocol: TCP
        - containerPort: 55681
          name: otlp-http
          protocol: TCP
        - containerPort: 55678
          name: opencensus
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /conf
          name: tempo-conf
      - args:
        - --query.base-path=/
        - --grpc-storage-plugin.configuration-file=/conf/tempo-query.yaml
        image: grafana/tempo-query:1.0.0
        imagePullPolicy: IfNotPresent
        name: tempo-query
        ports:
        - containerPort: 16686
          name: jaeger-ui
          protocol: TCP
        - containerPort: 16687
          name: jaeger-metrics
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /conf
          name: tempo-query-conf
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: tempo
      serviceAccountName: tempo
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: tempo-query
        name: tempo-query-conf
      - configMap:
          defaultMode: 420
          name: tempo
        name: tempo-conf
  updateStrategy:
    type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/instance: tempo
    app.kubernetes.io/name: tempo
  name: tempo
spec:
  ports:
  - name: tempo-prom-metrics
    port: 3100
    protocol: TCP
    targetPort: 3100
  - name: tempo-query-jaeger-ui
    port: 16686
    protocol: TCP
    targetPort: 16686
  - name: tempo-jaeger-thrift-compact
    port: 6831
    protocol: UDP
    targetPort: 6831
  - name: tempo-jaeger-thrift-binary
    port: 6832
    protocol: UDP
    targetPort: 6832
  - name: tempo-jaeger-thrift-http
    port: 14268
    protocol: TCP
    targetPort: 14268
  - name: tempo-jaeger-grpc
    port: 14250
    protocol: TCP
    targetPort: 14250
  - name: tempo-zipkin
    port: 9411
    protocol: TCP
    targetPort: 9411
  - name: tempo-otlp-legacy
    port: 55680
    protocol: TCP
    targetPort: 55680
  - name: tempo-otlp-http
    port: 55681
    protocol: TCP
    targetPort: 55681
  - name: tempo-otlp-grpc
    port: 4317
    protocol: TCP
    targetPort: 4317
  - name: tempo-opencensus
    port: 55678
    protocol: TCP
    targetPort: 55678
  selector:
    app.kubernetes.io/instance: tempo
    app.kubernetes.io/name: tempo
  sessionAffinity: None
  type: ClusterIP
[root@ip-172-31-44-216 ~]# oc get po,svc,ep,ing -n tracing
NAME          READY   STATUS    RESTARTS   AGE
pod/tempo-0   2/2     Running   0          7m10s

NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                    AGE
service/tempo   ClusterIP   10.96.119.238   <none>        3100/TCP,16686/TCP,6831/UDP,6832/UDP,14268/TCP,14250/TCP,9411/TCP,55680/TCP,55681/TCP,4317/TCP,55678/TCP   7m10s

NAME              ENDPOINTS                                                        AGE
endpoints/tempo   10.244.1.8:14268,10.244.1.8:14250,10.244.1.8:55680 + 8 more...   7m10s

NAME                                      CLASS    HOSTS                                ADDRESS     PORTS     AGE
ingress.networking.k8s.io/grafanaclient   <none>   grafanaclient.172.31.44.216.nip.io   localhost   80, 443   3m3s
apiVersion: v1
kind: ServiceAccount
metadata:
  name: grafana-agent-traces
  namespace: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-agent-traces
  namespace: monitoring
data:
  agent.yaml: |
    server:
        http_listen_port: 8080
        log_level: info
    tempo:
        configs:
          - batch:
                send_batch_size: 1000
                timeout: 5s
            name: default
            receivers:
                jaeger:
                    protocols:
                        grpc: null
                        thrift_binary: null
                        thrift_compact: null
                        thrift_http: null
                    remote_sampling:
                        insecure: true
                        strategy_file: /etc/agent/strategies.json
                opencensus: null
                otlp:
                    protocols:
                        grpc: null
                        http: null
                zipkin: null
            attributes:
              actions:
              - action: upsert
                key: cluster
                value: kube-one
            remote_write:
              - endpoint: grafanaclient.172.31.44.216.nip.io:443
                insecure: true
                #basic_auth:
                    #password: ${TEMPO_PASSWORD}
                    #username: ${TEMPO_USERNAME}
                retry_on_failure:
                    enabled: false
            scrape_configs:
              - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                job_name: kubernetes-pods
                kubernetes_sd_configs:
                  - role: pod
                relabel_configs:
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_namespace
                    target_label: namespace
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_name
                    target_label: pod
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_container_name
                    target_label: container
                tls_config:
                    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                    insecure_skip_verify: false
  strategies.json: '{"default_strategy": {"param": 0.001, "type": "probabilistic"}}'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: grafana-agent-traces
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-agent-traces
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-agent-traces
subjects:
- kind: ServiceAccount
  name: grafana-agent-traces
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: grafana-agent-traces
  name: grafana-agent-traces
  namespace: monitoring
spec:
  ports:
  - name: agent-http-metrics
    port: 8080
    targetPort: 8080
  - name: agent-thrift-compact
    port: 6831
    protocol: UDP
    targetPort: 6831
  - name: agent-thrift-binary
    port: 6832
    protocol: UDP
    targetPort: 6832
  - name: agent-thrift-http
    port: 14268
    protocol: TCP
    targetPort: 14268
  - name: agent-thrift-grpc
    port: 14250
    protocol: TCP
    targetPort: 14250
  - name: agent-zipkin
    port: 9411
    protocol: TCP
    targetPort: 9411
  - name: agent-otlp
    port: 55680
    protocol: TCP
    targetPort: 55680
  - name: agent-opencensus
    port: 55678
    protocol: TCP
    targetPort: 55678
  selector:
    name: grafana-agent-traces
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: grafana-agent-traces
  namespace: monitoring
spec:
  minReadySeconds: 10
  selector:
    matchLabels:
      name: grafana-agent-traces
  template:
    metadata:
      labels:
        name: grafana-agent-traces
    spec:
      containers:
      - args:
        - -config.file=/etc/agent/agent.yaml
        command:
        - /bin/agent
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        image: grafana/agent:v0.15.0
        imagePullPolicy: IfNotPresent
        name: agent
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 6831
          name: thrift-compact
          protocol: UDP
        - containerPort: 6832
          name: thrift-binary
          protocol: UDP
        - containerPort: 14268
          name: thrift-http
          protocol: TCP
        - containerPort: 14250
          name: thrift-grpc
          protocol: TCP
        - containerPort: 9411
          name: zipkin
          protocol: TCP
        - containerPort: 55680
          name: otlp
          protocol: TCP
        - containerPort: 55678
          name: opencensus
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/agent
          name: grafana-agent-traces
      serviceAccount: grafana-agent-traces
      tolerations:
      - effect: NoSchedule
        operator: Exists
      volumes:
      - configMap:
          name: grafana-agent-traces
        name: grafana-agent-traces
  updateStrategy:
    type: RollingUpdate
[root@ip-172-31-44-216 demo]# oc get po,svc,ep -n monitoring
NAME                             READY   STATUS    RESTARTS   AGE
pod/grafana-agent-traces-6rx5s   1/1     Running   0          2m56s
pod/grafana-agent-traces-bhsp5   1/1     Running   0          2m56s

NAME                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                       AGE
service/grafana-agent-traces   ClusterIP   10.96.133.212   <none>        8080/TCP,6831/UDP,6832/UDP,14268/TCP,14250/TCP,9411/TCP,55680/TCP,55678/TCP   2m56s

NAME                             ENDPOINTS                                                        AGE
endpoints/grafana-agent-traces   10.244.0.2:14250,10.244.1.9:14250,10.244.0.2:6832 + 13 more...   2m56s
[root@ip-172-31-44-216 demo]# oc logs -f grafana-agent-traces-6rx5s -n monitoring
ts=2021-06-16T10:02:19.17947343Z level=info agent=prometheus component=cluster msg="applying config"
ts=2021-06-16T10:02:19.179737433Z level=info agent=prometheus component=cluster msg="not watching the KV, none set"
ts=2021-06-16T10:02:19Z level=info msg="Tempo Logger Initialized" component=tempo
ts=2021-06-16T10:02:19Z level=info msg="shutting down receiver" component=tempo tempo_config=default
ts=2021-06-16T10:02:19Z level=info msg="shutting down processors" component=tempo tempo_config=default
ts=2021-06-16T10:02:19Z level=info msg="shutting down exporters" component=tempo tempo_config=default
ts=2021-06-16T10:02:19Z level=info msg="Exporter is enabled." component=tempo tempo_config=default component_kind=exporter exporter=otlp/0
ts=2021-06-16T10:02:19Z level=info msg="Exporter is starting..." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0
ts=2021-06-16T10:02:19Z level=info msg="Exporter started." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0
ts=2021-06-16T10:02:19.184725605Z level=info component="tempo service disco" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2021-06-16T10:02:19Z level=info msg="Pipeline is enabled." component=tempo tempo_config=default pipeline_name=traces pipeline_datatype=traces
ts=2021-06-16T10:02:19Z level=info msg="Pipeline is starting..." component=tempo tempo_config=default pipeline_name=traces pipeline_datatype=traces
ts=2021-06-16T10:02:19Z level=info msg="Pipeline is started." component=tempo tempo_config=default pipeline_name=traces pipeline_datatype=traces
ts=2021-06-16T10:02:19Z level=info msg="Receiver is enabled." component=tempo tempo_config=default component_kind=receiver component_type=jaeger component_name=jaeger datatype=traces
ts=2021-06-16T10:02:19Z level=info msg="Receiver is enabled." component=tempo tempo_config=default component_kind=receiver component_type=opencensus component_name=opencensus datatype=traces
ts=2021-06-16T10:02:19Z level=info msg="Receiver is enabled." component=tempo tempo_config=default component_kind=receiver component_type=otlp component_name=otlp datatype=traces
ts=2021-06-16T10:02:19Z level=info msg="Receiver is enabled." component=tempo tempo_config=default component_kind=receiver component_type=zipkin component_name=zipkin datatype=traces
ts=2021-06-16T10:02:19Z level=info msg="Receiver is starting..." component=tempo tempo_config=default component_kind=receiver component_type=zipkin component_name=zipkin
ts=2021-06-16T10:02:19Z level=info msg="Receiver started." component=tempo tempo_config=default component_kind=receiver component_type=zipkin component_name=zipkin
ts=2021-06-16T10:02:19Z level=info msg="Receiver is starting..." component=tempo tempo_config=default component_kind=receiver component_type=jaeger component_name=jaeger
ts=2021-06-16T10:02:19Z level=info msg="Receiver started." component=tempo tempo_config=default component_kind=receiver component_type=jaeger component_name=jaeger
ts=2021-06-16T10:02:19Z level=info msg="Receiver is starting..." component=tempo tempo_config=default component_kind=receiver component_type=opencensus component_name=opencensus
ts=2021-06-16T10:02:19Z level=info msg="Receiver started." component=tempo tempo_config=default component_kind=receiver component_type=opencensus component_name=opencensus
ts=2021-06-16T10:02:19Z level=info msg="Receiver is starting..." component=tempo tempo_config=default component_kind=receiver component_type=otlp component_name=otlp
ts=2021-06-16T10:02:19Z level=info msg="Starting GRPC server on endpoint 0.0.0.0:4317" component=tempo tempo_config=default component_kind=receiver component_type=otlp component_name=otlp
ts=2021-06-16T10:02:19Z level=info msg="Setting up a second GRPC listener on legacy endpoint 0.0.0.0:55680" component=tempo tempo_config=default component_kind=receiver component_type=otlp component_name=otlp
ts=2021-06-16T10:02:19Z level=info msg="Starting GRPC server on endpoint 0.0.0.0:55680" component=tempo tempo_config=default component_kind=receiver component_type=otlp component_name=otlp
ts=2021-06-16T10:02:19Z level=info msg="Starting HTTP server on endpoint 0.0.0.0:55681" component=tempo tempo_config=default component_kind=receiver component_type=otlp component_name=otlp
ts=2021-06-16T10:02:19Z level=info msg="Receiver started." component=tempo tempo_config=default component_kind=receiver component_type=otlp component_name=otlp
ts=2021-06-16T10:02:19.188933259Z level=info msg="server configuration changed, restarting server"
ts=2021-06-16T10:02:19.189135013Z level=info caller=server.go:245 http=[::]:8080 grpc=[::]:9095 msg="server listening on addresses"
ts=2021-06-16T10:07:39Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed"
ts=2021-06-16T10:09:24Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed"
ts=2021-06-16T10:09:34Z level=error msg="Exporting failed. Try enabling retry_on_failure config option." component=tempo tempo_config=default component_kind=exporter component_type=otlp component_name=otlp/0 error="failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection closed"
rfratto commented 3 years ago

The first thing that jumps out to me here is that you have inescure set in your tracing remote_write on the Agent. That disables TLS, but your ingress uses TLS; that's going to cause a handshake error and for the connection to be refused.

I'd also recommend against using 55680 for your service/ingress. It works for now, but 55680 will likely eventually be removed. You should consider using the explicit 4317 port you configured instead.

cloudcafetech commented 3 years ago

Configure ingress with 4317 port ? Same GRPC protocol.

PLEASE help me to make it work.

rfratto commented 3 years ago

I want to give you the benefit if the doubt, but the way you keep asking for help comes off as entitled. Maybe it's unintentional; that's fine, everyone makes mistakes and communication is hard. But intentional or not: please stop, because it's not helping me be interested in helping you. This repository is a free support channel, and we are already taking the time out of our days to help you. If you don't stop behaving this way, I am going to close this issue.

With that out of the way, please carefully re-read what I said. 55680 will work for now, but is deprecated and will eventually be removed. This has nothing to do with your current problem, but I was giving you a heads up that it may give you problems down the road.

The insecure line being set to true is more likely to be one of the causes of your problems, as I mentioned.

cloudcafetech commented 3 years ago

Then based on your below comment

The first thing that jumps out to me here is that you have inescure set in your tracing remote_write on the Agent. That disables TLS, but your ingress uses TLS; that's going to cause a handshake error and for the connection to be refused.

Ingress support GRPC over 443 not 80. So I have to use ingress with TLS to make it work no alternate.

Now how can I use setup tracing agent with secure. Any pointer will help.

Anyway thanks for your time.

rfratto commented 3 years ago

You can just remove the insecure: true line from your remote_write config. If you're not using a valid TLS certificate for the ingress, you'll need to set insecure_skip_verify: true on the remote_write config as well.

I'll open an issue to add support for custom CAs in Tempo's remote_write, since it looks like we don't currently support that. (Edit: that issue is #662)

cloudcafetech commented 3 years ago

Thanks, let me try that. just wanted tell you I can't use 55680 as with distributed deployment endpoint NOT available https://github.com/grafana/tempo/issues/768 . Expected it will work with 4317 with ingress.

cloudcafetech commented 3 years ago

Did not see any error, seem data is coming VERY VERY late in s3 (Minio). some error message .. (updated)

ts=2021-06-17T02:28:47.899282321Z level=info caller=server.go:245 http=[::]:8080 grpc=[::]:9095 msg="server listening on addresses"
ts=2021-06-17T02:58:49.124181601Z level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=. err="unable to open WAL: open wal: no such file or directory"
ts=2021-06-17T02:58:49.124251231Z level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=bin err="unable to open WAL: open bin/wal: no such file or directory"
ts=2021-06-17T02:58:49.124273546Z level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=boot err="unable to open WAL: open boot/wal: no such file or directory"
ts=2021-06-17T02:58:49.124290839Z level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=dev err="unable to open WAL: open dev/wal: no such file or directory"
ts=2021-06-17T02:58:49.124318071Z level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=etc err="unable to open WAL: open etc/wal: no such file or directory"
ts=2021-06-17T02:58:49.12434801Z level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=home err="unable to open WAL: open home/wal: no such file or directory"
ts=2021-06-17T02:58:49.124442774Z level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=lib err="unable to open WAL: open lib/wal: no such file or directory"
ts=2021-06-17T02:58:49.124471578Z level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=lib64 err="unable to open WAL: open lib64/wal: no such file or directory"
ts=2021-06-17T02:58:49.124493207Z level=warn agent=prometheus component=cleaner msg="unable to find segment mtime of WAL" name=media err="unable to open WAL: open media/wal: no such file or directory"

As per my understanding tracing getting tracing data from application and storing inside POD (tracing agent) then doing remote write Tempo; right ? if yes, then may I know storing data local path in POD ?

And data location of in Tempo (Ingester)? is it in /var/tempo/wal ?

@robx

If you get a time please reply my query, hence closing ...

Thank you very much for your valuable support :)