openyurtio / openyurt

OpenYurt - Extending your native Kubernetes to edge(project under CNCF)
https://openyurt.io
Apache License 2.0
1.69k stars 398 forks source link

[feature request] Improve the implementation of workloads creating in yurtappset #1968

Closed vie-serendipity closed 5 months ago

vie-serendipity commented 5 months ago

What would you like to be added: When slowstartbatch encounters its first error it simply terminates, ideally it tries to create all the workloads first and resolves problems at the next reconcile after encountering the problem. So improve the current implementation. https://github.com/openyurtio/openyurt/blob/34b14ccadfe26b8956144d21ec62a79a3aef1760/pkg/yurtmanager/controller/yurtappset/yurtappset_controller.go#L368-L370

Why is this needed: When something unexpected happens, we want most of the workloads to work first, while it is likely to get stuck in one workload and keep looping according to the current implementation.

others /kind feature

rambohe-ch commented 5 months ago

@vie-serendipity It seems that the current slowstartbatch fail policy is reasonable. because it is good to fail fast before the problem is solved.

rambohe-ch commented 5 months ago

@vie-serendipity Would you like to tell me more details about this issue?

vie-serendipity commented 5 months ago

@rambohe-ch I'm not sure, but I'd like to take a fail-tolerant strategy at reconcile to achieve the desired state of the spec as much as possible, i.e., continue to try to create other resources after failing to create one. @luc99hen What do you think?

vie-serendipity commented 5 months ago

Actually, this is the same logic used by replicaset to create pod. Kubernetes adopt fail-fast policy, so it's the same here.