Open gtdang opened 2 months ago
This shouldn't be an issue if mpiexec
is called with oversubscription allowed. Any idea why it's still failing?
If I recall correctly, the goal with the GUI was to expose as little of the parallel backend API as possible while still running the tutorials in a timely manner. Since almost all of the tutorials run single trials, JoblibBackend
isn't very useful and MPIBackend
can easily be run under the hood by defaulting to the maximal number of parallel jobs.
We can still convert to len(os.sched_getaffinity(0))
, but I personally don't think it's necessary to expose the n_procs
argument in the GUI. Big picture, I think there's something to be said for the GUI running without too much technical bloat that will confuse new users.
@rythorpe Are you suggesting we also remove the Cores:
option from the GUI when using the JoblibBackend
? If the goal is to remove the technical bloat, should we also remove the MPI cmd:
option from the GUI? Not sure if we expect users to actually change this from the default
For what it's worth, I personally don't find it too technical to expose the number of cores. It can be nice to see what you have access to on your machine, as long as the max is set to the # of cores available on the instance to prevent user input error
@rythorpe Are you suggesting we also remove the
Cores:
option from the GUI when using theJoblibBackend
? If the goal is to remove the technical bloat, should we also remove theMPI cmd:
option from the GUI? Not sure if we expect users to actually change this from the defaultFor what it's worth, I personally don't find it too technical to expose the number of cores. It can be nice to see what you have access to on your machine, as long as the max is set to the # of cores available on the instance to prevent user input error
See the conversation on the PR for more details. I guess I see the GUI primarily an educational tool, so while I agree that adding the number of cores as a simulation parameter in the GUI isn't, in itself, a deal breaker, I think there's something to be said for a GUI simulation that "just runs" without having to sift through a myriad of parameters that don't directly relate to the scientific discourse of our workshops.
I've been testing out the GUI on Brown's HPC system (OSCAR). Running simulations with the MPI backend is not working because it's requesting too many processors than the instance allows. The node that my instance is running on has 48 cores, but my instance is not allotted access to all the node's cores.
https://github.com/jonescompneurolab/hnn-core/blob/18830b53c4602c8e0f9b89502222b27799c5e3ae/hnn_core/gui/gui.py#L1916-L1918 The GUI initializes the backend at the lines above using
multiprocessing.cpu_count
, which returns the node's total but not my instance's allotment.The joblib backend allows you to specify the number of cores in with the GUI. Is there a reason why this option is not exposed for MPI?
joblib options
MPI options
This stack overflow answer also has a way to get the number of available cores instead of total.