Closed ramcdougal closed 9 months ago
In Python, we'd run:
mpiexec -n 4 python my_mpi_model.py
so could it be:
mpiexec -n 4 matlab my_mpi_model.m
or can we have MATLAB make multiple connections
MATLAB can be launched with mpirun
on a mac, so this may just work:
(base) ramcdougal@Roberts-iMac 20230519 % mpirun -n 2 /Applications/MATLAB_R2020a.app/bin/matlab -nodesktop -nosplash -r test
< M A T L A B (R) >
Copyright 1984-2020 The MathWorks, Inc.
R2020a Update 6 (9.8.0.1538580) 64-bit (maci64)
November 23, 2020
< M A T L A B (R) >
Copyright 1984-2020 The MathWorks, Inc.
R2020a Update 6 (9.8.0.1538580) 64-bit (maci64)
November 23, 2020
To get started, type doc.
For product information, visit www.mathworks.com.
To get started, type doc.
For product information, visit www.mathworks.com.
Hello world
Hello world
Here, test.m
is:
disp('Hello world')
Hi Robert. The NEURON code (we call with the clib) contains the MPI code, right? Because I think while you can start MATLAB with mpirun we do not have any language constructs to do MPI programming like that. Instead MATLAB uses MPICH with the Parallel Programming Toolbox to do MPI in the background to use with our parallel programming constructs like parallel pool and spmd. So would the NEURON MPI processes succeed with the interprocess communication when they are spawned by the MATLAB process? How is the MPI functionality mostly used in NEURON, do you know examples? I kind of ignored it during my own research and did the parallel processing "outside" of NEURON. I am very interested in this aspect and would be willing to help/test as well.
Yes, NEURON would do all the MPI stuff. There are two main use cases for parallel simulation: (1) speeding up an individual simulation, and (2) running families of simulations in parallel.
Case (2) is nice to have but not essential, as this is typically an embarrassingly parallel problem with no communication between the nodes.
The more interesting case is (1), which you can do by having different compute nodes be responsible for different cells. (You can also split an individual cell across nodes, but that's only really useful for cases with very bad load balancing.) A simple example is at: https://nrn.readthedocs.io/en/8.2.2/tutorials/ball-and-stick-4.html
It kind of works. In order to run the MATLAB version of the testmpi code from the documentation, the libnrnmpi_ompi.so library needs to be findable (I created a symbolic link in the directory where also the libnrniv.so lives).
This is the testmpi.m:
function testmpi()
addpath(genpath("/home/thomas/Documents/MATLAB/matlabneuroninterface"));
n = neuron.Neuron();
n.nrnmpi_init();
pc = n.ParallelContext();
disp("I am " + num2str(pc.id()) + " of " + num2str(pc.nhost()));
n.quit();
This can run in matlab (testmpi) or via mpirun -n 4 matlab -batch testmpi.
However... on the last line crashes the interface library with this problem:
Warning: MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.
> In neuron.Neuron.call_func_hoc (line 212)
In neuron/Neuron/dynamic_call (line 111)
In indexing (line 134)
In testmpi (line 9)
Warning: 'quit': number or type of arguments incorrect.
If I leave the n.quit() out it runs without errors, but I get an MPI warning about improper termination of processes (as expected). It appears the h.quit() is not working correctly. But this could be related to how I set everything up. Maybe this can be confirmed?
EDIT: adding pc.done();
instead of n.quit();
does not cause any harm but does not stop the MPI warning either.
Running this on windows gives:
>> n.nrnmpi_init();
>> pc = n.ParallelContext();
>> disp("I am " + num2str(pc.id()) + " of " + num2str(pc.nhost()));
I am 0 of 1
The line n.quit()
makes matlab close.
You'd want to run this as a script launched with mpiexec not from an interactive session.
Quit is supposed to exit the active program.
MatLab:
function example_mpi()
% Run from command prompt with: mpiexec -n 4 matlab -batch example_mpi
setup;
n = neuron.launch();
n.nrnmpi_init();
pc = n.ParallelContext();
disp("I am " + num2str(pc.id()) + " of " + num2str(pc.nhost()));
pc.done();
n.quit();
CMD:
>>mpiexec -n 4 matlab -batch example_mpi
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 0 of 1
Could not load libnrnpython1
pyver10=1 pylib=NULL
numprocs=1
Could not load libnrnpython1
pyver10=1 pylib=NULL
numprocs=1
Could not load libnrnpython1
pyver10=1 pylib=NULL
numprocs=1
Could not load libnrnpython1
pyver10=1 pylib=NULL
numprocs=1
@ramcdougal Why is it trying to load python stuff? Looks like it's looking for a dll called libnrnpythonPYVER.dll
with PYVER = 38, 39, 310, 311
. However in my case PYVER
is set to 1.
After adding a line to the initialize function:
nrn_is_python_extension = 1;
nrnpy_set_pr_etal(mlprint, NULL);
nrn_is_python_extension = 0; // Added this line
Now I get the output:
>>mpiexec -n 4 matlab -batch example_mpi
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 0 of 1
numprocs=1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 0 of 1
numprocs=1
numprocs=1
numprocs=1
For the official example it also doesn't work - it looks like my windows build does not support MPI.
C:\nrn>mpiexec -n 2 nrniv -mpi test0.hoc
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
numprocs=1
numprocs=1
I am 0 of 1
I am 0 of 1
(Sorry for the information overload!)
That's very strange. That version was built by the CI scripts; I'd expect it to support MPI.
It shouldn't matter, but which MPI are you running? On Windows, we test with Microsoft MPI.
Other things that are strange: why is the NEURON banner printing when you launch via MATLAB with MPI? (If it does it without mpi, there's a flag in the c examples that shows how to disable it... But if it only does that with MPI, that's strange.) I had thought we always were disabling Python.
On linux, I do get it to kinda work, but indeed something goes wrong during quit. Slightly earlier, it tries to access a file called cleanup at a location that does not exist on my computer.
I edited my linux_matlab.sh so at the last line it calls mpiexec -n 2 ${MY_MATLAB} -batch example_mpi
EDIT: after a little digging, I found that for me the cleanup is at /home/aljen.uitbeijerse/.conda/envs/neuron90a0/lib/python3.10/site-packages/neuron/.data/share/nrn/lib/
, however how to tell this to neuron?
./doc/example_startup_scripts/linux_matlab.sh
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
NEURON -- VERSION 9.0.dev-1361-g98cad3ae4 HEAD (98cad3ae4) 2023-06-12
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
NEURON -- VERSION 9.0.dev-1361-g98cad3ae4 HEAD (98cad3ae4) 2023-06-12
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
numprocs=2
I am 1 of 2
I am 0 of 2
sh: 1: /root/nrn/build/cmake_install/share/nrn/lib/cleanup: Permission denied
sh: 1: /root/nrn/build/cmake_install/share/nrn/lib/cleanup: Permission denied
Warning: MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.
Warning: MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.
> In neuron.Session.call_func_hoc (line 234)
> In neuron.Session.call_func_hoc (line 234)
In neuron/Session/dynamic_call (line 117)
In neuron/Session/dynamic_call (line 117)
In indexing (line 156)
In example_mpi (line 10)
In indexing (line 156)
In example_mpi (line 10)
Warning: 'quit': number or type of arguments incorrect.
Warning: 'quit': number or type of arguments incorrect.
> In neuron.Session.call_func_hoc (line 235)
In neuron/Session/dynamic_call (line 117)
In indexing (line 156)
> In neuron.Session.call_func_hoc (line 235)
In example_mpi (line 10)
In neuron/Session/dynamic_call (line 117)
In indexing (line 156)
In example_mpi (line 10)
Warning: The following error was caught while executing 'neuron.Object' class
destructor:
Error using clib.neuron.hoc_obj_unref
MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.
Error in neuron.Object/delete (line 75)
clib.neuron.hoc_obj_unref(self.obj);
Error in example_mpi (line 4)
setup;
> In example_mpi (line 4)
Warning: The following error was caught while executing 'neuron.Object' class
destructor:
Error using clib.neuron.hoc_obj_unref
MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.
Error in neuron.Object/delete (line 75)
clib.neuron.hoc_obj_unref(self.obj);
Error in example_mpi (line 4)
setup;
> In example_mpi (line 4)
Error using clib.neuron.get_nrn_functions
MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.
Error in neuron.Session/fill_dynamic_props (line 28)
arr = split(clib.neuron.get_nrn_functions(), ";");
Error in indexing (line 160)
self.fill_dynamic_props();
Error in example_mpi (line 10)
n.quit();
Error using clib.neuron.get_nrn_functions
MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.
Error in neuron.Session/fill_dynamic_props (line 28)
arr = split(clib.neuron.get_nrn_functions(), ";");
Error in indexing (line 160)
self.fill_dynamic_props();
Error in example_mpi (line 10)
n.quit();
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[48985,1],0]
Exit code: 1
--------------------------------------------------------------------------
We should change this warning, because it is incorrect and very confusing:
'quit': number or type of arguments incorrect.
Instead, something like:
'quit': error during call to Neuron function
(or something like that)
That's very strange. That version was built by the CI scripts; I'd expect it to support MPI.
It shouldn't matter, but which MPI are you running? On Windows, we test with Microsoft MPI.
Other things that are strange: why is the NEURON banner printing when you launch via MATLAB with MPI? (If it does it without mpi, there's a flag in the c examples that shows how to disable it... But if it only does that with MPI, that's strange.) I had thought we always were disabling Python.
I am using Intel(R) MPI Library for Windows* OS, Version 2021.8 Build 20221129
Isn't this also strange because we don't have any ability to verify the number of arguments that a NEURON function expects? So where would that message come from at all?
But maybe one option is to just have MATLAB do the quitting whenever n.quit()
is invoked?
Great news! If I run this, it works out of the box:
c:\nrn\bin\mpiexec.exe -n 4 matlab -batch example_mpi
But maybe one option is to just have MATLAB do the quitting whenever n.quit() is invoked?
I like that idea!
My output on windows now:
>c:\nrn\bin\mpiexec.exe -n 4 matlab -batch example_mpi
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
Event Support License -- for demonstration use and event support. Not for government,
research, commercial, or other organizational use.
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 0 of 4
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 2 of 4
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 1 of 4
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits
I am 3 of 4
numprocs=4
We are working on some final tweaks to close nicely, without a warning that MPI was terminated unexpectedly.
Also, for Linux, I could get rid of the crash because the cleanup file could not be found, by adding a line in the linux_matlab.sh: export NEURONHOME="${NRNML_NRNPATH}share/nrn"
. However, I'm not sure that that is also the correct path when neuron is installed directly, instead of through conda.
Continuing the theme of my confusion: I don't know why there would be a version of mpiexec
inside the nrn
folder, but I'm glad it works.
To do @AljenU : try to run on linux, noninteractively, in-process, with n.quit()
NEURON uses MPI for parallel simulation. This should still work in a MATLAB context.