Closed luator closed 1 month ago
I now tested on the MPI cluster. Output-wise it looks good (i.e. same as local). Jobs are still listed for a while in condor_q
after cluster_utils terminated but since I'm also not able to kill them with condor_rm
(saying they are not found), this might just be a glitch of condor.
Now also successfully tested on Galvani.
Handling the shutdown on SIGINT (including sys.exit()) within the signal handler function is problematic as the signal can be received multiple times before the application actually exists. This results in the duplicated prints and potentially other issues. Instead, only set a flag in the signal handler, which is then checked in each iteration of the main loop. Add a simple
SignalWatcher
utility class for this. Unfortunately this adds more redundant code inhp_optimization()
andgrid_search()
but I don't see an easy way to avoid that at the moment.And another change that is indirectly related as it was necessary to avoid messy output: Use context manager for the progress bars to make sure they are properly closed when the loop finishes. This results in much nicer output, as prints by following code will come after the progress bars and not randomly somewhere within them.
Below is the output of
grid_search
when run locally and Ctrl+C gets pressed.before:
after:
How I Tested
Ran examples locally. Test on galvani is pending as the login node is not really responsive at the moment...