Open pavolloffay opened 2 years ago
Hi, thanks for raising this issue, it's also something we've been thinking about. There are several different forms this tool could take, and some work to identify the important variables and formulas, definitely including the ones you mentioned. A document with approximate calculations is ok, but there is also a need for a more sophisticated and accurate tool, in Tempo and the other databases. See Mimir's discussion for reference. Tempo would likely adopt the same approach.
For now I can share some metrics from our internal clusters:
I'd expect these requirements to change over the next few releases as we add support for parquet blocks, likely increasing at first, but then stabilizing as we improve things.
Could you please describe what queries the test was doing? Is the lookback or time range affecting query resources? Was query part using functions or just scaled querier?
Does retention anyhow affect resource requirements?
Could you please describe what queries the test was doing? Is the lookback or time range affecting query resources? Was query part using functions or just scaled querier?
This was gathered from our own clusters which run real workloads and have a mixture of trace lookups and searches, and lookback of 1 or 24H, and using both querier pods and functions. Total querier resources is a function of data volume involved in a search. All queries are sharded into fixed-size sub-jobs, so a 2x time range will scan 2x data, and likewise a cluster with 2x volume across same time range. Scaling up pods or functions can keep latency down by executing more sub-jobs in parallel.
Does retention anyhow affect resource requirements?
Retention affects how many blocks exist, which mostly impacts latency and object store requests. Tempo reads a bloom filter per block, so 2x retention will issue 2x reads to object store. Latency can be controlled by scaling up queriers to check more bloom filters in parallel (and more recently making use of https://github.com/grafana/tempo/pull/1388). Increased block list also has a small but not significant increase in memory since block metadata including name/size/location is kept in memory.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.
@mdisibio could re-open this ticket and perhaps document the resources in the docs?
We have used values in this ticker in the Tempo Kubernetes operator and we would like to keep them updated if storage or other components change.
Got it, reopening. Expecting the requirements to change in Tempo 2.0 with TraceQL and full parquet, will gather new numbers then.
This will not block Tempo 2.0 from releasing so I'm moving it out of the v2.0 milestone.
Heads up to @electron0zero and @mapno that this issue exists. After you do your research please publish some guidelines for the community and close out this issue.
I'm happy to add this information to the documentation when it's ready.
I have someone installing the operator on Openshift and we kept noticing an OOM error on our tempo-tracing-stack-query-front
pod, but we were getting confused cause it was only using about half the memory requestion for the pod before hitting the CrashBackLoop.
After a little investigation, we noticed that the pod consists of two containers (tempo
and tempo-query
). It seems like the tempo-query
container is doing 90% of the work and sucking up all the memory but for some reason the memory usage is split evenly between the pods so we OOM after only using half the memory as mentioned above.
It would probably be a better use of resources if tempo
was just hard coded with a relatively low amount since it does not seem to be using much and maybe given like 2% cut of the rest of the memory.
@mdisibio may i know how you calculated the ingestion rate of 1MB/sec?
@venkatb-zelar By comparing container_cpu_usage_seconds_total
and tempo_distributor_bytes_received_total
for a given tempo install.
@mdisibio for what component ? 🤷
Is your feature request related to a problem? Please describe.
I would like to know (approximately) Tempo cluster size and how many resources it will need for a given ingestion rate and retention - number of spans/time, average byte span size, retention N days (maybe I am missing some input parameters).
Such a document is useful when evaluating tempo from the cost perspective or capacity planning.
Describe the solution you'd like
Documentation on Tempo cluster sizing.
Describe alternatives you've considered
Run tests Tempo
Additional context