jonescompneurolab / hnn-core

Simulation and optimization of neural circuits for MEG/EEG source estimates
https://jonescompneurolab.github.io/hnn-core/
BSD 3-Clause "New" or "Revised" License
55 stars 52 forks source link

MPI on HPC #870

Open gtdang opened 2 months ago

gtdang commented 2 months ago

I've been testing out the GUI on Brown's HPC system (OSCAR). Running simulations with the MPI backend is not working because it's requesting too many processors than the instance allows. The node that my instance is running on has 48 cores, but my instance is not allotted access to all the node's cores.

https://github.com/jonescompneurolab/hnn-core/blob/18830b53c4602c8e0f9b89502222b27799c5e3ae/hnn_core/gui/gui.py#L1916-L1918 The GUI initializes the backend at the lines above using multiprocessing.cpu_count, which returns the node's total but not my instance's allotment.

The joblib backend allows you to specify the number of cores in with the GUI. Is there a reason why this option is not exposed for MPI?

joblib options

Screenshot 2024-08-22 at 5 27 07 PM

MPI options

Screenshot 2024-08-22 at 5 26 40 PM

This stack overflow answer also has a way to get the number of available cores instead of total.

rythorpe commented 2 months ago

This shouldn't be an issue if mpiexec is called with oversubscription allowed. Any idea why it's still failing?

If I recall correctly, the goal with the GUI was to expose as little of the parallel backend API as possible while still running the tutorials in a timely manner. Since almost all of the tutorials run single trials, JoblibBackend isn't very useful and MPIBackend can easily be run under the hood by defaulting to the maximal number of parallel jobs.

We can still convert to len(os.sched_getaffinity(0)), but I personally don't think it's necessary to expose the n_procs argument in the GUI. Big picture, I think there's something to be said for the GUI running without too much technical bloat that will confuse new users.

dylansdaniels commented 2 months ago

@rythorpe Are you suggesting we also remove the Cores: option from the GUI when using the JoblibBackend? If the goal is to remove the technical bloat, should we also remove the MPI cmd: option from the GUI? Not sure if we expect users to actually change this from the default

For what it's worth, I personally don't find it too technical to expose the number of cores. It can be nice to see what you have access to on your machine, as long as the max is set to the # of cores available on the instance to prevent user input error

rythorpe commented 2 months ago

@rythorpe Are you suggesting we also remove the Cores: option from the GUI when using the JoblibBackend? If the goal is to remove the technical bloat, should we also remove the MPI cmd: option from the GUI? Not sure if we expect users to actually change this from the default

For what it's worth, I personally don't find it too technical to expose the number of cores. It can be nice to see what you have access to on your machine, as long as the max is set to the # of cores available on the instance to prevent user input error

See the conversation on the PR for more details. I guess I see the GUI primarily an educational tool, so while I agree that adding the number of cores as a simulation parameter in the GUI isn't, in itself, a deal breaker, I think there's something to be said for a GUI simulation that "just runs" without having to sift through a myriad of parameters that don't directly relate to the scientific discourse of our workshops.