Open raxod502-plaid opened 2 years ago
Hi @raxod502-plaid I'm not sure why we have PDBs defined for all pods. I would think that we would only need them for ingesters, and I would also argue that stay at 1 as most people will be running Loki with the default replication factor of 3, in which case you really wouldn't be able to tolerate a loss of more than one.
As for why we have PDBs for other pods, that's probably a question for @unguiculus.
My initial reaction is to remove these for all but the ingester, and keep the ingester hardcoded to 1. Are there compelling use cases to do otherwise?
I was thinking if there are use cases to increase the replication factor beyond 3, then there would correspondingly be use cases to increase the maxUnavailable count for the ingester beyond 1. I.e., either both parameters should be configurable, or neither. Does that sound right?
That said, in our deployment, we have the replication factor set at its default of 3, so we don't actually need that customizability.
By default, the loki-distributed chart hardcodes all PDBs at maxUnavailable = 1:
https://github.com/grafana/helm-charts/blob/cf03ae14fa7d30819e2c1a648b125bfaaa436ec4/charts/loki-distributed/templates/querier/poddisruptionbudget-querier.yaml#L12
This slows down our cluster rolls a lot, e.g.:
Is there any reason the chart shouldn't be updated to make this parameter customizable? Would a pull request be accepted?