Open alculquicondor opened 1 month ago
/assign @mszadkow
/kind flake
The XGBoostJob has some state transition bugs. So, maybe we need to remove the test case from Kueue or fix the root bug in the training-operator.
I see, thanks for the context.
@mszadkow any chance you can take a look in the training-operator code?
In the meantime, let's disable this test by calling ginkgo.Skip()
with an accompanying comment.
@tenzen-y Can you explain more about the transition bug, is it known one?
Yes, sure I can have a look there but like you said will skip it for now.
@tenzen-y Can you explain more about the transition bug, is it known one?
Depending on historical reasons, we just used to rerun the failed flaky tests in the TrainingOperator. So, we do not have a dedicated issue for specific transitions.
But, we explained the transition issue a little bit here: https://github.com/kubeflow/training-operator/issues/1711
What happened:
What you expected to happen:
Test to pass
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):git describe --tags --dirty --always
):cat /etc/os-release
):uname -a
):