Closed quaquel closed 7 months ago
I found the source of the segfault. Long story short: deamonic threads and mpi don't mix. Now I need to find a proper fix to ensure that the log watcher thread terminates correctly instead of being shut down by a time out.
This commit log describes my experience developing the first iteration of the MPIEvaluator remarkably well.
I feel your pain.
yes testing and debugging mpi is tricky. Even more so because of openmpi on delftblue vs. mpich on my mac. This all now seems to work, although occasionally, it hangs during start up on delft blue for unclear reasons.
When running the tutorial I now get this error:
I don't see a line change there, but could this PR have caused it in some other way?
And it seems the current source code is also correct:
are you sure you have the correct version of the workbench installed (this caused me various headaches before).
Fixes a mistake in the MPI evaluator where the pool is created twice (probably some code merging issue in the past)
It also cleans up the mpi example and associated slurm file.
NB: this code segfaults on DelftBlue during shut down. I still need to investigate this. It seems related to the treading used for handling logging.