grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
4.03k stars 521 forks source link

Add a limit that caps the maximum allowed size of an attribute #3985

Open 09jvilla opened 2 months ago

09jvilla commented 2 months ago

Is your feature request related to a problem? Please describe. When trying to fetch traces that have spans with very large attributes, the Tempo queriers run out of memory and crash. We've observed this when trying to fetch a single trace using the tracebyID endpoint). The trace itself didn't have a ton of spans (roughly 500), but it was very large in size (approximately 250KB). It was very large because some of the spans in that trace had attributes whose values were very large in size.

Describe the solution you'd like One way to avoid these out-of-memory crashes to create a limit on the maximum allowable size of any individual attribute. On the ingest path, Tempo would then reject any spans that had attributes above this size. Or possibly it could store the span, but throw out the specific attribute that was above the size limit.

If those spans were never ingested, they would not exist to be queried, and therefore could not crash the queriers when fetched.

Additional context This solution obviously comes with the tradeoff that the very large attribute does not get stored in the database, which may be problematic if the user actually wanted to store and later read back that attribute.

We're essentially making that assumption that above a certain size, it really doesn't make sense for something to be an attribute value, and likely what's getting sent in is just garbage or possibly the result of a misconfiguration. What that "certain size" is is of course a bit subjective.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.