traceloop / docs

Documentation for Traceloop & OpenLLMetry
Apache License 2.0
5 stars 25 forks source link

🚀 Feature: Add docs for local grafana tempo integration #43

Open henryxparker opened 3 months ago

henryxparker commented 3 months ago

Which component is this feature for?

Anthropic Instrumentation

🔖 Feature description

An addition to the grafana tempo docs that includes instructions on how to connect it with a local grafana instance.

🎤 Why is this feature needed ?

I'm evaluating traceloop for my team. I don't have much grafana experience and so trying to get this working with a local version of grafana has honestly been an absolute nightmare. (even though I know it should have been really simple)

✌️ How do you aim to achieve this?

Add a blurb under the "Without Grafana Agent" section: If you are running tempo locally set the environment variable to point to tempo's http ingest port

default: TRACELOOP_BASE_URL=0.0.0.0:4318

🔄️ Additional Information

No response

👀 Have you spent some time to check if this feature request has been raised before?

Are you willing to submit PR?

None

nirga commented 3 months ago

Hey @henryxparker! Thanks and sorry you had a bad experience with the grafana integration. We'll work with the Grafana team on making this work better. In the meantime - can you verify that setting TRACELOOP_BASE_URL=0.0.0.0:4318 worked for you locally? We'll update the docs at https://github.com/traceloop/docs

henryxparker commented 3 months ago

actually it required TRACELOOP_BASE_URL=http://0.0.0.0:4318 because the default for the local tempo setup in the grafana tempo docs does not use https, but yes I can confirm it worked locally.

henryxparker commented 3 months ago

If you would like to verify locally here is a docker compose file and a config file for tempo. Put them in the same directory, make a subdirectory called tempo-data and then call docker compose up and you should be able to access grafana at localhost:3000 where you can see the traces.

These were created from combining two examples from the grafana docs: grafana-agent-example, tempo-local-quickstart

docker-compose.yaml

version: '3'
services:
  # Tempo runs as user 10001, and docker compose creates the volume as root.
  # As such, we need to chown the volume in order for Tempo to start correctly.
  init:
    image: &tempoImage grafana/tempo:latest
    user: root
    entrypoint:
      - "chown"
      - "10001:10001"
      - "/var/tempo"
    volumes:
      - ./tempo-data:/var/tempo

  tempo:
    image: *tempoImage
    command: [ "-config.file=/etc/tempo.yaml" ]
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
      - ./tempo-data:/var/tempo
    ports:
      - "14268:14268"  # jaeger ingest
      - "3200:3200"   # tempo
      - "9095:9095" # tempo grpc
      - "4317:4317"  # otlp grpc
      - "4318:4318"  # otlp http
      - "9411:9411"   # zipkin
    depends_on:
      - init
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml
  prometheus:
    image: prom/prometheus:v2.47.0
    command:
      - --web.enable-remote-write-receiver
      - --config.file=/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
  grafana:
    environment:
      - GF_PATHS_PROVISIONING=/etc/grafana/provisioning
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    entrypoint:
      - sh
      - -euc
      - |
        mkdir -p /etc/grafana/provisioning/datasources
        cat <<EOF > /etc/grafana/provisioning/datasources/ds.yaml
        apiVersion: 1
        datasources:
        - name: Loki
          type: loki
          access: proxy
          orgId: 1
          url: http://loki:3100
          basicAuth: false
          isDefault: false
          version: 1
          editable: false
        - name: Prometheus
          type: prometheus
          orgId: 1
          url: http://prometheus:9090
          basicAuth: false
          isDefault: false
          version: 1
          editable: false
        - name: Tempo
          type: tempo
          access: proxy
          orgId: 1
          url: http://tempo:3200
          basicAuth: false
          isDefault: true
          version: 1
          editable: false
          apiVersion: 1
          uid: tempo
          jsonData:
            httpMethod: GET
            serviceMap:
              datasourceUid: prometheus
        EOF
        /run.sh
    image: grafana/grafana:latest
    ports:
      - "3000:3000"

tempo.yaml

stream_over_http_enabled: true
server:
  http_listen_port: 3200
  log_level: info

query_frontend:
  search:
    duration_slo: 5s
    throughput_bytes_slo: 1.073741824e+09
  trace_by_id:
    duration_slo: 5s

distributor:
  receivers:                           # this configuration will listen on all ports and protocols that tempo is capable of.
    jaeger:                            # the receives all come from the OpenTelemetry collector.  more configuration information can
      protocols:                       # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
        thrift_http:                   #
        grpc:                          # for a production deployment you should only enable the receivers you need!
        thrift_binary:
        thrift_compact:
    zipkin:
    otlp:
      protocols:
        http:
        grpc:
    opencensus:

ingester:
  max_block_duration: 5m               # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normally

compactor:
  compaction:
    block_retention: 1h                # overall Tempo trace retention. set for demo purposes

metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: docker-compose
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true
  traces_storage:
    path: /var/tempo/generator/traces

storage:
  trace:
    backend: local                     # backend configuration to use
    wal:
      path: /var/tempo/wal             # where to store the wal locally
    local:
      path: /var/tempo/blocks

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics, local-blocks] # enables metrics generator
      generate_native_histograms: both
DSgUY commented 1 month ago

@henryxparker do you solve this? Same nightmare here!

zioproto commented 1 month ago

@DSgUY I extended this Azure Sample to have traceloop sending traces to a Grafana Tempo installation running in my Kubernetes cluster:

https://github.com/Azure-Samples/azure-openai-terraform-deployment-sample/

Here is how I install the Helm chart locally: https://github.com/Azure-Samples/azure-openai-terraform-deployment-sample/blob/b5a113691e19f23667f2caf268c5d4916d370de6/infra/installation_script.tftpl#L7-L11

Here is how to point your application to send traces to the to the local Grafana Tempo distributor: https://github.com/Azure-Samples/azure-openai-terraform-deployment-sample/blob/b5a113691e19f23667f2caf268c5d4916d370de6/sample-application/chatbot.py#L47

nirga commented 1 month ago

@zioproto @DSgUY @henryxparker if any of you are willing to update our docs at https://github.com/traceloop/docs with the things you learned here that would be tremendously helpful for the community! ❤️ I just can't seem to get enough time to test this myself so I can't be certain how to fix our current guide.

DSgUY commented 1 month ago

@zioproto @DSgUY @henryxparker if any of you are willing to update our docs at https://github.com/traceloop/docs with the things you learned here that would be tremendously helpful for the community! ❤️ I just can't seem to get enough time to test this myself so I can't be certain how to fix our current guide.

I'm still trying but sure...

DSgUY commented 1 month ago

I manage to get the traces. can i configure metrucs and logs? Maybe using prometheus, promtail and loki?

nirga commented 1 month ago

I think grafana agent can translate the otel metrics format to Prometheus - https://grafana.com/docs/agent/latest/flow/tasks/opentelemetry-to-lgtm-stack/

nirga commented 1 month ago

Btw @DSgUY traceloop as a platform can also integrate with grafana if that helps