grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
3.76k stars 489 forks source link

Correctly return 429 on OTLPHTTP when Tempo rate limits #3020

Open joe-elliott opened 8 months ago

joe-elliott commented 8 months ago

Describe the bug Currently Tempo returns 500 from the OTLPHTTP endpoint when 500ing because of the way errors are handled.

If the linked issue is resolved then ResourceExhausted should correctly return 429s. If it's not resolved then we would need to implement our own http server to correctly do this.

joe-elliott commented 8 months ago

A PR is up to fix the issue in the OTEL collector: https://github.com/open-telemetry/opentelemetry-collector/pull/8080

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

ywwg commented 3 months ago

This pretty severely affects the error rate and overall SLO for traces, as well as, 500 errors are not retried by clients which will lead to lost data. This should probably be a higher priority bug.

swar8080 commented 6 days ago

Linking https://github.com/grafana/tempo/issues/3831 since maybe some rate limits should use a different 4xx status code when retrying the request won't help