Open SaschaSchwarze0 opened 4 years ago
This is not a bug, it works as designed. It would be good to know numbers of a potential performance degradation when reconciles never stops. Adding this to #174 for a short discussion.
Agreed, this constant reconcilation paradigm does feel little chatty. Though in general, it isn't expensive. Nevertheless, would be good to see what's the resource footprint.
@SaschaSchwarze0 do u know if we have some internal results around this? or are this metrics(multiple reconciles system overload) something we can request to Emily or similar to get for us?
@qu1queee no, I do not have results. But agree, would be interesting to see the difference between a performance run on a clean system vs one where 1000 (just a random number) build runs are reconciling because of some failure.
The result is that the reconcilation happens endlessly. And this is just one sample, other reasons for reconcilation are bad references to credentials. To prevent a system overload from these reconcilations, we should do things:
1. Apply a delay time when reconciling, see discussion at [#109 (comment)](https://github.com/redhat-developer/build/pull/109#issuecomment-614079616)
Good approach! This will ease the pressure the API-Server will try requeue failed attempts.
2. Investigate whether we can stop the reconcilation process if the user does not fix the root cause within a certain time (maybe one hour) and put the custom resource into some "permanently failed state"
An example of permanent failed state can be taken from service-binding-operator:
// NoRequeue returns error without requeue flag.
func NoRequeue(err error) (reconcile.Result, error) {
return reconcile.Result{}, err
}
Additionally, we should define the different result scenarios as dedicated functions, to inform the Kubernetes API-Server how to proceed, and re-use this behavior throughout the operator.
As a practical example, please consider the methods defined here.
We just had the situation on our development cluster that two build custom resources in the system were already defining the service account as an object. Due to a mistake during deployment, an old build operator was expecting a string there.
The result is that the reconcilation happens endlessly. And this is just one sample, other reasons for reconcilation are bad references to credentials. To prevent a system overload from these reconcilations, we should do things: