Open KMiller-Grafana opened 2 years ago
We should also mention to use fast disks for ingesters and store-gateways (see https://github.com/grafana/mimir/issues/1722#issuecomment-1112789110).
Maybe this will just be taken care of in https://github.com/grafana/mimir/issues/1988 but recently I was looking at the capacity planning page and was a bit confused when I read
CPU: 1 core for every 300,000 series in memory
Memory: 2.5GB for every 300,000 series in memory
Disk space: 5GB for every 300,000 series in memory
Is the idea that I calculate the total number of active series in my cluster and then figure out the cpu, memory, and disk space requirements for all ingesters in the whole cluster? How do I figure out how many ingesters I need and what the individual resources allocated to each ingester should be? Do I arbitrarily pick a number of ingesters and then just divide the total resource requirements by the number of ingesters?
For the ingesters specifically, is the disk space requirement at all impacted by how many hours of data I want to retain on disk?
I wonder if ingester disk usage would be better estimated as a function of DPM rather than active series.
In any case, I think the ingester sizing that @09jvilla points out is using some unstated assumptions about the scrape interval and retention period.
The capacity planning doc was initially conceived to be a simplification and have 1 single metric per component to use for scaling (for ingesters I picked active series). I understand it was an oversimplification and it's showing its limits. My feeling is that documenting all proper math would make it quite complicated for the user, that's why I would move forward replacing it with a tool, where we incapsulate all our logic.
I wonder if ingester disk usage would be better estimated as a function of DPM rather than active series.
Yes, it would.
Estimated high due to unactionable state of doc issue and necessary research if implemented.
The guidelines for Alertmanager seem too low:
Perhaps it was meant to say '100 firing alerts per second'? It does not seem right for a single alert to consume 10MB of RAM.
The guidelines for Alertmanager seem too low:
@mac133k You're right. See my PR to update it: https://github.com/grafana/mimir/pull/3132
1.
See the link under the heading "Monolithic mode?" It is a link to the next paragraph/section. Super unhelpful link for any reader that clicks on it, since it goes to the next sentence. Just remove it.