services are build with docker python:3.10.15-slim and run on k8s
services use
opentelemetry-api==1.27.0
opentelemetry-sdk==1.27.0
opentelemetry-propagator-b3==1.27.0
opentelemetry-exporter-otlp-proto-grpc==1.27.0
opentelemetry-instrumentation-fastapi==0.48b0
opentelemetry-instrumentation-aiohttp-client==0.48b0
opentelemetry-instrumentation-asyncpg==0.48b0
opentelemetry-instrumentation-psycopg==0.48b0
opentelemetry-instrumentation-psycopg2==0.48b0
opentelemetry-instrumentation-requests==0.48b0
opentelemetry-instrumentation-logging==0.48b0
opentelemetry-instrumentation-system-metrics==0.48b0
opentelemetry-instrumentation-grpc==0.48b0
What happened?
I'm using the OTEL_SEMCONV_STABILITY_OPT_IN feature (I'm currently running http/dup ) and am seeing some weird results with http latencies. It seems to me to use the same bucket sizes as the old metrics. Doesn't the buckets need to be smaller since the unit has been changed from milliseconds to seconds, with the lowest bucket being 5 seconds it not particularly useful and most percentiles calculated from my metrics show that p99 for most of my services/paths are 5 seconds which is not very accurate.
nodejs and dotnet overwrite the default buckets with more sane values.
The images show the same metric during the same time for the same labelset as a histogram and the older one being more granular and useful.
sum(rate(http_server_duration_milliseconds_bucket{app="x", environment="dev"}[$__rate_interval])) by (le)sum(rate(http_server_request_duration_seconds_bucket{app="x", environment="dev"}[$__rate_interval])) by (le)
Steps to Reproduce
set OTEL_SEMCONV_STABILITY_OPT_IN="http/dup"
it can then be visualized in graphana similarly to this:
sum(rate(http_server_duration_milliseconds_bucket{app="x", environment="dev"}[$__rate_interval])) by (le)sum(rate(http_server_request_duration_seconds_bucket{app="x", environment="dev"}[$__rate_interval])) by (le)
Expected Result
I expected to see the same percentiles for my services/paths using the semantic metrics.
Actual Result
new metrics are scewed towards 5 seconds because of buckets sizes.
Describe your environment
services are build with docker python:3.10.15-slim and run on k8s services use opentelemetry-api==1.27.0 opentelemetry-sdk==1.27.0 opentelemetry-propagator-b3==1.27.0 opentelemetry-exporter-otlp-proto-grpc==1.27.0 opentelemetry-instrumentation-fastapi==0.48b0 opentelemetry-instrumentation-aiohttp-client==0.48b0 opentelemetry-instrumentation-asyncpg==0.48b0 opentelemetry-instrumentation-psycopg==0.48b0 opentelemetry-instrumentation-psycopg2==0.48b0 opentelemetry-instrumentation-requests==0.48b0 opentelemetry-instrumentation-logging==0.48b0 opentelemetry-instrumentation-system-metrics==0.48b0 opentelemetry-instrumentation-grpc==0.48b0
What happened?
I'm using the OTEL_SEMCONV_STABILITY_OPT_IN feature (I'm currently running
http/dup
) and am seeing some weird results with http latencies. It seems to me to use the same bucket sizes as the old metrics. Doesn't the buckets need to be smaller since the unit has been changed from milliseconds to seconds, with the lowest bucket being 5 seconds it not particularly useful and most percentiles calculated from my metrics show that p99 for most of my services/paths are 5 seconds which is not very accurate.nodejs and dotnet overwrite the default buckets with more sane values.
The images show the same metric during the same time for the same labelset as a histogram and the older one being more granular and useful.
sum(rate(http_server_duration_milliseconds_bucket{app="x", environment="dev"}[$__rate_interval])) by (le)
sum(rate(http_server_request_duration_seconds_bucket{app="x", environment="dev"}[$__rate_interval])) by (le)
Steps to Reproduce
set OTEL_SEMCONV_STABILITY_OPT_IN="http/dup"
it can then be visualized in graphana similarly to this:
sum(rate(http_server_duration_milliseconds_bucket{app="x", environment="dev"}[$__rate_interval])) by (le)
sum(rate(http_server_request_duration_seconds_bucket{app="x", environment="dev"}[$__rate_interval])) by (le)
Expected Result
I expected to see the same percentiles for my services/paths using the semantic metrics.
Actual Result
new metrics are scewed towards 5 seconds because of buckets sizes.
Additional context
No response
Would you like to implement a fix?
None