Open liggitt opened 4 years ago
/cc @logicalhan
Hello @liggitt and @fedebongio Bug Triage team here for the 1.18 release. This is a friendly reminder that code freeze is scheduled for 5 March. Is this issue still intended for milestone 1.18?
Hello @kubernetes/sig-api-machinery-bugs This issue appears frozen. No movement since December. Should we push this to the next milestone?
/milestone clear
What happened: e2e tests deleting namespaces timed out waiting for namespace cleanup. this was the root cause of https://github.com/kubernetes/kubernetes/issues/86181.
I graphed namespace controller lag times:
https://docs.google.com/spreadsheets/d/1hYxDyvZ9o-3T0WrOJ7LgW-sxQn62fU8RWfDjm02OoLc/edit#gid=1101829529
The namespace controller, as configured, can't keep up with the parallelism of the e2e jobs. For most tests, this doesn't fail the test because the e2e job waits at the end for namespace deletion to complete. For the GCEPD test, namespace deletion is a synchronous part of the test. Depending on where the GCEPD test fell, the controller was sometimes too backed up to finish removing the namespace in time.
Things we could do:
/sig api-machinery /cc @deads2k @msau42
Note that this blocks moving https://github.com/kubernetes/kubernetes/issues/86181 back into the main e2e (at least as written)