hpc / pavilion2

Pavilion is a Python 3 (3.5+) based framework for running and analyzing tests targeting HPC systems.
https://pavilion2.readthedocs.io/
Other
43 stars 20 forks source link

Fixed on_node failed build non-stopping problem. Now builds with on_n… #765

Closed dmageeLANL closed 5 months ago

dmageeLANL commented 5 months ago

Fixes issue #763. Raises exception on non test.build_local build failure which stops pavilion from running the test anyway. And stops non test.build_local runs from writing BUILD_CREATED lines to status over and over when this test is queried.

There could probably be a better condition to stop writing BUILD_CREATED statuses. But in this case, if the run is completed or has an error status, it will keep that status.

Code review checklist:

dmageeLANL commented 5 months ago

This PR now has a bonus fix! The original PR revealed a bug. In status_file.py:_parse_status_line, if the status timestamp is in the legacy format it can't be cast to a float and so the program skips into the first except. There it pops the status line list again, and tries to apply the legacy time format to that element. But that element is now the 2nd element which is the status state not the timestamp. So don't double pop, just reuse time_part.

Also, the failsafe call when = datetime.datetime(0, 0, 0) doesn't work either. If you don't want it to fail, use when = datetime.datetime.now().

This commit caught that because now status.current() is called in builder.py:TestBuilder:__init__(). And this is REALLY in the main line of the code. It's is called nearly every time the pav command is run. So when it hit the legacy test, it failed.