cirrus: Fail CI if there are python stack traces in the log.

igsilya commented 1 year ago

If the test script fails, it doesn't fail with an error code. It continues to gather logs and mine data, so we have a clean exit code in the end.

We look for iteration failures afterwards, but we don't look for hard failures in python code.

Fix that by grepping for python 'Traceback' and failing if it is found.

dceara commented 1 year ago

If the test script fails, it doesn't fail with an error code. It continues to gather logs and mine data, so we have a clean exit code in the end.

I didn't try it but can't we just store the exit code of the ovn_tester.py run

https://github.com/dceara/ovn-heater/blob/8f1f15d81a7f614dc489bdf189410180d3654d4b/do.sh#L382

and exit with that after we mine the data?

We look for iteration failures afterwards, but we don't look for hard failures in python code.

Fix that by grepping for python 'Traceback' and failing if it is found.

Otherwise this is OK but I'm quite sure we'll find other failure cases in the future..

igsilya commented 1 year ago

If the test script fails, it doesn't fail with an error code. It continues to gather logs and mine data, so we have a clean exit code in the end.

I didn't try it but can't we just store the exit code of the ovn_tester.py run

https://github.com/dceara/ovn-heater/blob/8f1f15d81a7f614dc489bdf189410180d3654d4b/do.sh#L382

and exit with that after we mine the data?

We could. Some users may not expect do.sh to fail though. :)

We look for iteration failures afterwards, but we don't look for hard failures in python code. Fix that by grepping for python 'Traceback' and failing if it is found.

Otherwise this is OK but I'm quite sure we'll find other failure cases in the future..

I think, this PR is still useful even if we fail the do.sh run, because we may have some internal issues swallowed by try/except blocks. If not now, then maybe in the future.

dceara commented 1 year ago

If the test script fails, it doesn't fail with an error code. It continues to gather logs and mine data, so we have a clean exit code in the end.

I didn't try it but can't we just store the exit code of the ovn_tester.py run https://github.com/dceara/ovn-heater/blob/8f1f15d81a7f614dc489bdf189410180d3654d4b/do.sh#L382

and exit with that after we mine the data?

We could. Some users may not expect do.sh to fail though. :)

That's a very good point! What if instead we (also) grep for Failed to run test. Check logs at (or similar)?

We look for iteration failures afterwards, but we don't look for hard failures in python code. Fix that by grepping for python 'Traceback' and failing if it is found.

Otherwise this is OK but I'm quite sure we'll find other failure cases in the future..

I think, this PR is still useful even if we fail the do.sh run, because we may have some internal issues swallowed by try/except blocks. If not now, then maybe in the future.

You're right, let's grep for tracebacks explicitly too.

igsilya commented 1 year ago

That's a very good point! What if instead we (also) grep for Failed to run test. Check logs at (or similar)?

Added to the patch.

ovn-org / ovn-heater

cirrus: Fail CI if there are python stack traces in the log. #152