interTwin-eu / itwinai

Advanced AI workflows for digital twins applications in science.
https://itwinai.readthedocs.io
MIT License
15 stars 5 forks source link

Trainer execute() method uses strategy cleanup() before program is finished #223

Closed jarlsondre closed 4 weeks ago

jarlsondre commented 1 month ago

Currently the standard TorchTrainer class calls self.strategy.clean_up() at the end of execute(), but for certain use cases such as when profiling this can be problematic as you cannot access the strategy methods after this. Additionally, even though you have called clean_up(), multiple processes are still running, meaning that by calling clean_up() you're really just removing the control of the strategy while still having it run.

The solution to this would probably involve moving some of the logic of the strategy out of the TorchTrainer class or changing the functionality so that clean_up() kills the processes. Killing the processes could also be bad, though, as you might want to be able to run them after the train() function has finished (such as in the profiling case).

jarlsondre commented 4 weeks ago

Not very important anymore, since the profiling has been moved to the train function instead of the execute function.