martius-lab / cluster_utils

https://martius-lab.github.io/cluster_utils/
Other
8 stars 0 forks source link

Handle shutdown outside of signal handler & use context manager for progress bars #121

Closed luator closed 1 month ago

luator commented 2 months ago

Handling the shutdown on SIGINT (including sys.exit()) within the signal handler function is problematic as the signal can be received multiple times before the application actually exists. This results in the duplicated prints and potentially other issues. Instead, only set a flag in the signal handler, which is then checked in each iteration of the main loop. Add a simple SignalWatcher utility class for this. Unfortunately this adds more redundant code in hp_optimization() and grid_search() but I don't see an easy way to avoid that at the moment.

And another change that is indirectly related as it was necessary to avoid messy output: Use context manager for the progress bars to make sure they are properly closed when the loop finishes. This results in much nicer output, as prints by following code will come after the progress bars and not randomly somewhere within them.

Below is the output of grid_search when run locally and Ctrl+C gets pressed.

before: Screenshot_grid_search_before

after: Screenshot_grid_search_after

How I Tested

Ran examples locally. Test on galvani is pending as the login node is not really responsive at the moment...

luator commented 2 months ago

I now tested on the MPI cluster. Output-wise it looks good (i.e. same as local). Jobs are still listed for a while in condor_q after cluster_utils terminated but since I'm also not able to kill them with condor_rm (saying they are not found), this might just be a glitch of condor.

luator commented 2 months ago

Now also successfully tested on Galvani.