Open akvlad opened 7 months ago
"@akvlad and @lmangani, the service graph metrics are not emitted on Qryn when using the configurations below."
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
zipkin:
endpoint: 0.0.0.0:9411
fluentforward:
endpoint: 0.0.0.0:24224
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 5s
static_configs:
- targets: ['exporter:8080']
processors:
batch:
send_batch_size: 10000
timeout: 5s
memory_limiter:
check_interval: 2s
limit_mib: 1800
spike_limit_mib: 500
resourcedetection/system:
detectors: ['system']
system:
hostname_sources: ['os']
resource:
attributes:
- key: service.name
value: "serviceName"
action: upsert
metricstransform:
transforms:
- include: calls_total
action: update
new_name: traces_spanmetrics_calls_total
- include: latency
action: update
new_name: traces_spanmetrics_latency
connectors:
spanmetrics:
histogram:
explicit:
buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
dimensions_cache_size: 1500
servicegraph:
metrics_exporter: qryn
latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
dimensions:
- randomContainer
store:
ttl: 2s
max_items: 200
exporters:
qryn:
dsn: clickhouse://default:**********@nl.ch.hepic.tel:****/cloki
timeout: 10s
sending_queue:
queue_size: 100
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
logs:
format: json
otlp/spanmetrics:
endpoint: localhost:4317
tls:
insecure: true
prometheus/servicegraph:
endpoint: localhost:9090
namespace: servicegraph
extensions:
health_check:
pprof:
zpages:
service:
extensions: [pprof, zpages, health_check]
pipelines:
logs:
receivers: [fluentforward, otlp]
processors: [memory_limiter, resourcedetection/system, resource, batch]
exporters: [qryn]
traces:
receivers: [otlp, jaeger, zipkin]
processors: [memory_limiter, resourcedetection/system, resource, batch]
exporters: [qryn]
traces/spanmetrics:
receivers: [ otlp, jaeger, zipkin ]
exporters: [spanmetrics]
traces/servicegraph:
receivers: [ otlp, jaeger, zipkin ]
exporters: [servicegraph]
metrics:
receivers: [prometheus]
processors: [memory_limiter, resourcedetection/system, resource, batch]
exporters: [qryn]
metrics/spanmetrics:
receivers: [spanmetrics]
processors: [ memory_limiter, resourcedetection/system, resource, batch ]
exporters: [qryn]
metrics/servicegraph:
receivers: [servicegraph]
processors: [memory_limiter, resourcedetection/system, resource, batch ]
exporters: [qryn]
In my opinion this requires a local prometheus remote_write socket to ingest the service graph metrics
receivers:
prometheusremotewrite:
endpoint: 0.0.0.0:9090
receivers:
otlp:
protocols:
grpc:
prometheusremotewrite:
endpoint: 0.0.0.0:9090
connectors:
servicegraph:
latency_histogram_buckets: [100ms, 250ms, 1s, 5s, 10s]
dimensions:
- dimension-1
- dimension-2
store:
ttl: 1s
max_items: 10
exporters:
prometheus/servicegraph:
endpoint: localhost:9090
namespace: servicegraph
qryn:
dsn: tcp://clickhouse-server:9000/cloki?username=default&password=*************
timeout: 10s
sending_queue:
queue_size: 100
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
logs:
format: json
service:
pipelines:
traces:
receivers: [otlp]
exporters: [servicegraph]
metrics/servicegraph:
receivers: [servicegraph]
exporters: [prometheus/servicegraph]
metrics:
receivers: [prometheusremotewrite]
processors: ...
exporters: [qryn]
@akvlad PR for the above task after testing.
Both Spanmetrices and Servicegraph are working fine after configuration update
After the otel collector dependencies were updated to the latest version, the were some deprecated processors. Some of the processors (spanmetrics) were mentioned in the qryn documentation like https://qryn.metrico.in/#/telemetry/ingestion
Please:
Nice to have:
UPD: Tasks to complete