metrico / qryn-docs

Documentation for the qryn project
https://qryn.metrico.in
4 stars 10 forks source link

Update the docimentation after the otel-collector update #12

Open akvlad opened 7 months ago

akvlad commented 7 months ago

After the otel collector dependencies were updated to the latest version, the were some deprecated processors. Some of the processors (spanmetrics) were mentioned in the qryn documentation like https://qryn.metrico.in/#/telemetry/ingestion

Please:

Nice to have:

UPD: Tasks to complete

akvlad commented 7 months ago

https://github.com/metrico/qryn/wiki/Tempo-Tracing

afzal-qxip commented 7 months ago

"@akvlad and @lmangani, the service graph metrics are not emitted on Qryn when using the configurations below."

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
  zipkin:
    endpoint: 0.0.0.0:9411
  fluentforward:
    endpoint: 0.0.0.0:24224
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 5s
          static_configs:
            - targets: ['exporter:8080']
processors:
  batch:
    send_batch_size: 10000
    timeout: 5s
  memory_limiter:
    check_interval: 2s
    limit_mib: 1800
    spike_limit_mib: 500
  resourcedetection/system:
    detectors: ['system']
    system:
      hostname_sources: ['os']
  resource:
    attributes:
      - key: service.name
        value: "serviceName"
        action: upsert
  metricstransform:
    transforms:
      - include: calls_total
        action: update
        new_name: traces_spanmetrics_calls_total
      - include: latency
        action: update
        new_name: traces_spanmetrics_latency
connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions_cache_size: 1500
  servicegraph:
    metrics_exporter: qryn
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions:
      - randomContainer
    store:
      ttl: 2s
      max_items: 200
exporters:
  qryn:
    dsn: clickhouse://default:**********@nl.ch.hepic.tel:****/cloki
    timeout: 10s
    sending_queue:
      queue_size: 100
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    logs:
      format: json
  otlp/spanmetrics:
    endpoint: localhost:4317
    tls:
      insecure: true
  prometheus/servicegraph:
    endpoint: localhost:9090
    namespace: servicegraph
extensions:
  health_check:
  pprof:
  zpages:
service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    logs:
      receivers: [fluentforward, otlp]
      processors: [memory_limiter, resourcedetection/system, resource, batch]
      exporters: [qryn]
    traces:
      receivers: [otlp, jaeger, zipkin]
      processors: [memory_limiter, resourcedetection/system, resource, batch]
      exporters: [qryn]
    traces/spanmetrics:
      receivers: [ otlp, jaeger, zipkin ]
      exporters: [spanmetrics]
    traces/servicegraph:
      receivers: [ otlp, jaeger, zipkin ]
      exporters: [servicegraph]
    metrics:
      receivers: [prometheus]
      processors: [memory_limiter, resourcedetection/system, resource, batch]
      exporters: [qryn]
    metrics/spanmetrics:
        receivers: [spanmetrics]
        processors: [ memory_limiter, resourcedetection/system, resource, batch ]
        exporters: [qryn]
    metrics/servicegraph:
        receivers: [servicegraph]
        processors: [memory_limiter, resourcedetection/system, resource, batch ]
        exporters: [qryn]
lmangani commented 7 months ago

In my opinion this requires a local prometheus remote_write socket to ingest the service graph metrics

receivers:
  prometheusremotewrite:
    endpoint: 0.0.0.0:9090

Fictional Example

receivers:
  otlp:
    protocols:
      grpc:
  prometheusremotewrite:
    endpoint: 0.0.0.0:9090

connectors:
  servicegraph:
    latency_histogram_buckets: [100ms, 250ms, 1s, 5s, 10s]
    dimensions:
      - dimension-1
      - dimension-2
    store:
      ttl: 1s
      max_items: 10

exporters:
  prometheus/servicegraph:
    endpoint: localhost:9090
    namespace: servicegraph
  qryn:
    dsn: tcp://clickhouse-server:9000/cloki?username=default&password=*************
    timeout: 10s
    sending_queue:
      queue_size: 100
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    logs:
       format: json

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [servicegraph]
    metrics/servicegraph:
      receivers: [servicegraph]
      exporters: [prometheus/servicegraph]
    metrics:
      receivers: [prometheusremotewrite]
      processors: ...
      exporters: [qryn]
afzal-qxip commented 7 months ago

@akvlad PR for the above task after testing.

Both Spanmetrices and Servicegraph are working fine after configuration update

https://github.com/metrico/qryn-docs/pull/13