grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
4.02k stars 521 forks source link

Tempo cluster sizing / capacity planning #1540

Open pavolloffay opened 2 years ago

pavolloffay commented 2 years ago

Is your feature request related to a problem? Please describe.

I would like to know (approximately) Tempo cluster size and how many resources it will need for a given ingestion rate and retention - number of spans/time, average byte span size, retention N days (maybe I am missing some input parameters).

Such a document is useful when evaluating tempo from the cost perspective or capacity planning.

Describe the solution you'd like

Documentation on Tempo cluster sizing.

Describe alternatives you've considered

Run tests Tempo

Additional context

mdisibio commented 2 years ago

Hi, thanks for raising this issue, it's also something we've been thinking about. There are several different forms this tool could take, and some work to identify the important variables and formulas, definitely including the ones you mentioned. A document with approximate calculations is ok, but there is also a need for a more sophisticated and accurate tool, in Tempo and the other databases. See Mimir's discussion for reference. Tempo would likely adopt the same approach.

For now I can share some metrics from our internal clusters:

I'd expect these requirements to change over the next few releases as we add support for parquet blocks, likely increasing at first, but then stabilizing as we improve things.

pavolloffay commented 2 years ago

Could you please describe what queries the test was doing? Is the lookback or time range affecting query resources? Was query part using functions or just scaled querier?

pavolloffay commented 2 years ago

Does retention anyhow affect resource requirements?

mdisibio commented 2 years ago

Could you please describe what queries the test was doing? Is the lookback or time range affecting query resources? Was query part using functions or just scaled querier?

This was gathered from our own clusters which run real workloads and have a mixture of trace lookups and searches, and lookback of 1 or 24H, and using both querier pods and functions. Total querier resources is a function of data volume involved in a search. All queries are sharded into fixed-size sub-jobs, so a 2x time range will scan 2x data, and likewise a cluster with 2x volume across same time range. Scaling up pods or functions can keep latency down by executing more sub-jobs in parallel.

Does retention anyhow affect resource requirements?

Retention affects how many blocks exist, which mostly impacts latency and object store requests. Tempo reads a bloom filter per block, so 2x retention will issue 2x reads to object store. Latency can be controlled by scaling up queriers to check more bloom filters in parallel (and more recently making use of https://github.com/grafana/tempo/pull/1388). Increased block list also has a small but not significant increase in memory since block metadata including name/size/location is kept in memory.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

pavolloffay commented 1 year ago

@mdisibio could re-open this ticket and perhaps document the resources in the docs?

We have used values in this ticker in the Tempo Kubernetes operator and we would like to keep them updated if storage or other components change.

mdisibio commented 1 year ago

Got it, reopening. Expecting the requirements to change in Tempo 2.0 with TraceQL and full parquet, will gather new numbers then.

joe-elliott commented 1 year ago

This will not block Tempo 2.0 from releasing so I'm moving it out of the v2.0 milestone.

joe-elliott commented 1 year ago

Heads up to @electron0zero and @mapno that this issue exists. After you do your research please publish some guidelines for the community and close out this issue.

knylander-grafana commented 1 year ago

I'm happy to add this information to the documentation when it's ready.

knylander-grafana commented 1 year ago

See also https://github.com/grafana/tempo/discussions/2836

Jaland commented 1 year ago

I have someone installing the operator on Openshift and we kept noticing an OOM error on our tempo-tracing-stack-query-front pod, but we were getting confused cause it was only using about half the memory requestion for the pod before hitting the CrashBackLoop.

After a little investigation, we noticed that the pod consists of two containers (tempo and tempo-query). It seems like the tempo-query container is doing 90% of the work and sucking up all the memory but for some reason the memory usage is split evenly between the pods so we OOM after only using half the memory as mentioned above.

It would probably be a better use of resources if tempo was just hard coded with a relatively low amount since it does not seem to be using much and maybe given like 2% cut of the rest of the memory.

venkatb-zelar commented 3 months ago

@mdisibio may i know how you calculated the ingestion rate of 1MB/sec?

mdisibio commented 3 months ago

@venkatb-zelar By comparing container_cpu_usage_seconds_total and tempo_distributor_bytes_received_total for a given tempo install.

lukasmrtvy commented 3 months ago

@mdisibio for what component ? 🤷