Closed Calvinaud closed 1 month ago
It would be nice to create a metadata-only informer to speed lookups for Labels or use cached client in subscriber. This should speed up the lookups
https://firehydrant.com/blog/dynamic-kubernetes-informers/ https://medium.com/@timebertt/kubernetes-controllers-at-scale-clients-caches-conflicts-patches-explained-aa0f7a8b4332
On a side note to get better visibility on performance and stability issues would be great to kickoff efforts on having instrumentation using https://opentelemetry.io/ (logs, metrics, traces and profiles). This would allow users to monitor components of Litmus on their own O11y stack, gain insights on scaling and stability challenges and can report/contribute with more data.
What happened: Context: Infrastructure at Cluster level on a cluster with important number of namespace/object.
When trying to create a ChaosExperiments on the UI. We cannot select any
App namespace
after selecting theApp Kind
. This is due to the query timeout just like in: https://github.com/litmuschaos/litmus/issues/4308. In our case, the timeout we are reaching is the one directly from the browser (1min) since the getKubeObject query is taking too long.The query is taking too long because our infrastructure is taking too long to answer, it taking around 2-3 for our infrastructure to send back the KubeObject list. This is probably because we have a big cluster and the function getKubeObject is not efficient enough. (for information, this is not due too a lack of resource from request/limit).
What you expected to happen: The query doesn't timeout and we can select the namespace.
Where can this issue be corrected? (optional) I think the main point to solve this issue is to have getKubeObject/GetKubernetesObjects https://github.com/litmuschaos/litmus/blob/master/chaoscenter/subscriber/pkg/k8s/objects.go#L27 more efficient/fast.
I think multiple solution is possible (so don't hesitate if you have any better solution) The first solution that can be possible is to have some parallelism on https://github.com/litmuschaos/litmus/blob/master/chaoscenter/subscriber/pkg/k8s/objects.go#L65. I don't think this is the best solution since it create a risk to DOS the api server with lot of request.
The second solution but seem harder to put in place is to not retrieve all the object of a type in every namespace at once but to separate it in two query:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: Two question related to this topic:
namespace
thenobject type
thenobject name
? (I think it's not really necessary the only gain is you could only show the object type present in the namespace)