quaquel / EMAworkbench

workbench for performing exploratory modeling and analysis
BSD 3-Clause "New" or "Revised" License
128 stars 90 forks source link

bug fix in MPI evaluator #349

Closed quaquel closed 7 months ago

quaquel commented 8 months ago

Fixes a mistake in the MPI evaluator where the pool is created twice (probably some code merging issue in the past)

It also cleans up the mpi example and associated slurm file.

NB: this code segfaults on DelftBlue during shut down. I still need to investigate this. It seems related to the treading used for handling logging.

coveralls commented 8 months ago

Coverage Status

coverage: 80.212% (-0.09%) from 80.3% when pulling 6b48e3e34cd5060832b062f9ce29d255121682b8 on mpi_fixes into 10429e464ebb71611ec1b0486a4cf5926057e6fc on master.

quaquel commented 8 months ago

I found the source of the segfault. Long story short: deamonic threads and mpi don't mix. Now I need to find a proper fix to ensure that the log watcher thread terminates correctly instead of being shut down by a time out.

EwoutH commented 8 months ago

This commit log describes my experience developing the first iteration of the MPIEvaluator remarkably well.

I feel your pain.

quaquel commented 7 months ago

yes testing and debugging mpi is tricky. Even more so because of openmpi on delftblue vs. mpich on my mac. This all now seems to work, although occasionally, it hangs during start up on delft blue for unclear reasons.

EwoutH commented 6 months ago

When running the tutorial I now get this error:

image

I don't see a line change there, but could this PR have caused it in some other way?

And it seems the current source code is also correct:

https://github.com/quaquel/EMAworkbench/blob/4b70a149dea4eff588a133a4337bdf4585acede3/ema_workbench/util/ema_logging.py#L182

quaquel commented 6 months ago

are you sure you have the correct version of the workbench installed (this caused me various headaches before).