Update the docimentation after the otel-collector update

akvlad commented 7 months ago

After the otel collector dependencies were updated to the latest version, the were some deprecated processors. Some of the processors (spanmetrics) were mentioned in the qryn documentation like https://qryn.metrico.in/#/telemetry/ingestion

Please:

[ ] Clone https://github.com/metrico/qryn-docs/
[ ] Check the pages:
[ ] Replace the spanmetrics processor with the spanmetrics connector according to the deprecation documentation https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.95.0/processor/spanmetricsprocessor
[ ] Replace servicegraph processor with the servicegraph connector https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/servicegraphconnector/README.md
[ ] Check if each configuration walkthrough works after the replacing

Nice to have:

Please do grep the documentation by the every module deprecated during the otel-collector update
Replace every extra deprecated module if you find any

UPD: Tasks to complete

[x] Make spanmetrics replacement
[ ] Replace spanmetrics in https://github.com/metrico/qryn-docs/blob/main/docs/metrics/ingestion.md
[ ] https://github.com/metrico/qryn-docs/blob/main/docs/logs/ingestion.md
[ ] https://github.com/metrico/qryn-docs/blob/main/docs/telemetry/ingestion.md
[ ] Make servicegraph replacement
[ ] Replace servicegraph in the documentation

akvlad commented 7 months ago

https://github.com/metrico/qryn/wiki/Tempo-Tracing

afzal-qxip commented 7 months ago

"@akvlad and @lmangani, the service graph metrics are not emitted on Qryn when using the configurations below."

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
  zipkin:
    endpoint: 0.0.0.0:9411
  fluentforward:
    endpoint: 0.0.0.0:24224
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 5s
          static_configs:
            - targets: ['exporter:8080']
processors:
  batch:
    send_batch_size: 10000
    timeout: 5s
  memory_limiter:
    check_interval: 2s
    limit_mib: 1800
    spike_limit_mib: 500
  resourcedetection/system:
    detectors: ['system']
    system:
      hostname_sources: ['os']
  resource:
    attributes:
      - key: service.name
        value: "serviceName"
        action: upsert
  metricstransform:
    transforms:
      - include: calls_total
        action: update
        new_name: traces_spanmetrics_calls_total
      - include: latency
        action: update
        new_name: traces_spanmetrics_latency
connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions_cache_size: 1500
  servicegraph:
    metrics_exporter: qryn
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions:
      - randomContainer
    store:
      ttl: 2s
      max_items: 200
exporters:
  qryn:
    dsn: clickhouse://default:**********@nl.ch.hepic.tel:****/cloki
    timeout: 10s
    sending_queue:
      queue_size: 100
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    logs:
      format: json
  otlp/spanmetrics:
    endpoint: localhost:4317
    tls:
      insecure: true
  prometheus/servicegraph:
    endpoint: localhost:9090
    namespace: servicegraph
extensions:
  health_check:
  pprof:
  zpages:
service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    logs:
      receivers: [fluentforward, otlp]
      processors: [memory_limiter, resourcedetection/system, resource, batch]
      exporters: [qryn]
    traces:
      receivers: [otlp, jaeger, zipkin]
      processors: [memory_limiter, resourcedetection/system, resource, batch]
      exporters: [qryn]
    traces/spanmetrics:
      receivers: [ otlp, jaeger, zipkin ]
      exporters: [spanmetrics]
    traces/servicegraph:
      receivers: [ otlp, jaeger, zipkin ]
      exporters: [servicegraph]
    metrics:
      receivers: [prometheus]
      processors: [memory_limiter, resourcedetection/system, resource, batch]
      exporters: [qryn]
    metrics/spanmetrics:
        receivers: [spanmetrics]
        processors: [ memory_limiter, resourcedetection/system, resource, batch ]
        exporters: [qryn]
    metrics/servicegraph:
        receivers: [servicegraph]
        processors: [memory_limiter, resourcedetection/system, resource, batch ]
        exporters: [qryn]

lmangani commented 7 months ago

In my opinion this requires a local prometheus remote_write socket to ingest the service graph metrics

receivers:
  prometheusremotewrite:
    endpoint: 0.0.0.0:9090

Fictional Example

receivers:
  otlp:
    protocols:
      grpc:
  prometheusremotewrite:
    endpoint: 0.0.0.0:9090

connectors:
  servicegraph:
    latency_histogram_buckets: [100ms, 250ms, 1s, 5s, 10s]
    dimensions:
      - dimension-1
      - dimension-2
    store:
      ttl: 1s
      max_items: 10

exporters:
  prometheus/servicegraph:
    endpoint: localhost:9090
    namespace: servicegraph
  qryn:
    dsn: tcp://clickhouse-server:9000/cloki?username=default&password=*************
    timeout: 10s
    sending_queue:
      queue_size: 100
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    logs:
       format: json

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [servicegraph]
    metrics/servicegraph:
      receivers: [servicegraph]
      exporters: [prometheus/servicegraph]
    metrics:
      receivers: [prometheusremotewrite]
      processors: ...
      exporters: [qryn]

afzal-qxip commented 7 months ago

@akvlad PR for the above task after testing.

Both Spanmetrices and Servicegraph are working fine after configuration update

https://github.com/metrico/qryn-docs/pull/13

metrico / qryn-docs

Update the docimentation after the otel-collector update #12

Fictional Example