Closed mykaul closed 9 months ago
Some SCT related changes would be required (and other infras still need to have full API implemented for them, like #289)
Some SCT related changes would be required (and other infras still need to have full API implemented for them, like #289)
Unsure why - the decision on failure is on the engineer investigating the run, no? I don't care the initial one to be FAILED-BUG, as long as I can manually move it to FAILED-INFRA.
Some SCT related changes would be required (and other infras still need to have full API implemented for them, like #289)
Unsure why - the decision on failure is on the engineer investigating the run, no? I don't care the initial one to be FAILED-BUG, as long as I can manually move it to FAILED-INFRA.
Ah, in that case it would be simpler, I was thinking we could actually catch infra failure (for example Spot Termination Error) vs test failure (The current "Failed" indicator logic)
In the future, sure - we can catch spot termination for example. That's future.
I rather suggest it would be based on issues attach
Engineer can take a call and say considered pass or not
If you want stats, one job can have a coredump, and an infra issue.
Adding such status is not gonna give you a clear picture if people are not gonna update it
Anyhow I'm calling @roydahan opinion on it as well
There is no such thing as Failed - infra. If something failed due to infra issue, it needs to be solved and rerun. Hence there is no point to hold such a state.
There is no such thing as Failed - infra. If something failed due to infra issue, it needs to be solved and rerun. Hence there is no point to hold such a state.
How you you classify spot termination then?
I classify them as failed and one need to rerun them. It's not entering the statistics anyway, only the last run is part of the statistics you see in the top bar.
@roydahan
again, we do want this state, but let's agree first on it's name (i.e. you didn't like failed-infra), we are open to suggestion
We would like to have instead of "Failed-Infra" status that is called "Test Error" and will be marked as different color (orange?) This can be introduced now, later I would like it to automatically be set with failures we will define, like: "SpotTermination".
Right now, it's quite impossible to measure how stable our infra is, our tests are solid and how many bugs we actually encouter. The dashboard today does not allow us to easily distinguished between failure modes. If we look at 5.4 right now, it look catastrophic, and I don't believe that's the case:
44% failure is terrible! But we know it's inaccurate, since a failure could be of either causes. We need more granularity.