mcdougallab / matlabneuroninterface

Interface for connecting NEURON and MATLAB
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

MPI simulation should work #37

Closed ramcdougal closed 9 months ago

ramcdougal commented 1 year ago

NEURON uses MPI for parallel simulation. This should still work in a MATLAB context.

ramcdougal commented 1 year ago

In Python, we'd run:

mpiexec -n 4 python my_mpi_model.py
ramcdougal commented 1 year ago

so could it be:

mpiexec -n 4 matlab my_mpi_model.m

or can we have MATLAB make multiple connections

ramcdougal commented 1 year ago

MATLAB can be launched with mpirun on a mac, so this may just work:

(base) ramcdougal@Roberts-iMac 20230519 % mpirun -n 2 /Applications/MATLAB_R2020a.app/bin/matlab -nodesktop -nosplash -r test

                            < M A T L A B (R) >
                  Copyright 1984-2020 The MathWorks, Inc.
              R2020a Update 6 (9.8.0.1538580) 64-bit (maci64)
                             November 23, 2020

                            < M A T L A B (R) >
                  Copyright 1984-2020 The MathWorks, Inc.
              R2020a Update 6 (9.8.0.1538580) 64-bit (maci64)
                             November 23, 2020

To get started, type doc.
For product information, visit www.mathworks.com.

To get started, type doc.
For product information, visit www.mathworks.com.

Hello world
Hello world

Here, test.m is:

disp('Hello world')
tkuenzmw commented 1 year ago

Hi Robert. The NEURON code (we call with the clib) contains the MPI code, right? Because I think while you can start MATLAB with mpirun we do not have any language constructs to do MPI programming like that. Instead MATLAB uses MPICH with the Parallel Programming Toolbox to do MPI in the background to use with our parallel programming constructs like parallel pool and spmd. So would the NEURON MPI processes succeed with the interprocess communication when they are spawned by the MATLAB process? How is the MPI functionality mostly used in NEURON, do you know examples? I kind of ignored it during my own research and did the parallel processing "outside" of NEURON. I am very interested in this aspect and would be willing to help/test as well.

ramcdougal commented 1 year ago

Yes, NEURON would do all the MPI stuff. There are two main use cases for parallel simulation: (1) speeding up an individual simulation, and (2) running families of simulations in parallel.

Case (2) is nice to have but not essential, as this is typically an embarrassingly parallel problem with no communication between the nodes.

The more interesting case is (1), which you can do by having different compute nodes be responsible for different cells. (You can also split an individual cell across nodes, but that's only really useful for cases with very bad load balancing.) A simple example is at: https://nrn.readthedocs.io/en/8.2.2/tutorials/ball-and-stick-4.html

tkuenzmw commented 1 year ago

It kind of works. In order to run the MATLAB version of the testmpi code from the documentation, the libnrnmpi_ompi.so library needs to be findable (I created a symbolic link in the directory where also the libnrniv.so lives).

This is the testmpi.m:

function testmpi()

addpath(genpath("/home/thomas/Documents/MATLAB/matlabneuroninterface"));
n = neuron.Neuron();
n.nrnmpi_init();
pc = n.ParallelContext();
disp("I am " + num2str(pc.id()) + " of " + num2str(pc.nhost()));
n.quit();

This can run in matlab (testmpi) or via mpirun -n 4 matlab -batch testmpi.

However... on the last line crashes the interface library with this problem:

Warning: MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library. 

> In neuron.Neuron.call_func_hoc (line 212)
In neuron/Neuron/dynamic_call (line 111)
In indexing (line 134)
In testmpi (line 9) 
Warning: 'quit': number or type of arguments incorrect. 

If I leave the n.quit() out it runs without errors, but I get an MPI warning about improper termination of processes (as expected). It appears the h.quit() is not working correctly. But this could be related to how I set everything up. Maybe this can be confirmed?

EDIT: adding pc.done(); instead of n.quit(); does not cause any harm but does not stop the MPI warning either.

edovanveen commented 10 months ago

Running this on windows gives:

>> n.nrnmpi_init();
>> pc = n.ParallelContext();
>> disp("I am " + num2str(pc.id()) + " of " + num2str(pc.nhost()));
I am 0 of 1

The line n.quit() makes matlab close.

ramcdougal commented 10 months ago

You'd want to run this as a script launched with mpiexec not from an interactive session.

Quit is supposed to exit the active program.

edovanveen commented 10 months ago

MatLab:

function example_mpi()
% Run from command prompt with: mpiexec -n 4 matlab -batch example_mpi

setup;
n = neuron.launch();
n.nrnmpi_init();
pc = n.ParallelContext();
disp("I am " + num2str(pc.id()) + " of " + num2str(pc.nhost()));
pc.done();
n.quit();

CMD:

>>mpiexec -n 4 matlab -batch example_mpi

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 0 of 1
Could not load libnrnpython1
pyver10=1 pylib=NULL
numprocs=1
Could not load libnrnpython1
pyver10=1 pylib=NULL
numprocs=1
Could not load libnrnpython1
pyver10=1 pylib=NULL
numprocs=1
Could not load libnrnpython1
pyver10=1 pylib=NULL
numprocs=1
edovanveen commented 10 months ago

@ramcdougal Why is it trying to load python stuff? Looks like it's looking for a dll called libnrnpythonPYVER.dll with PYVER = 38, 39, 310, 311. However in my case PYVER is set to 1.

edovanveen commented 10 months ago

After adding a line to the initialize function:

        nrn_is_python_extension = 1;
        nrnpy_set_pr_etal(mlprint, NULL);
        nrn_is_python_extension = 0; // Added this line

Now I get the output:

>>mpiexec -n 4 matlab -batch example_mpi

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 0 of 1
numprocs=1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 0 of 1
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 0 of 1
numprocs=1
numprocs=1
numprocs=1
edovanveen commented 10 months ago

For the official example it also doesn't work - it looks like my windows build does not support MPI.

C:\nrn>mpiexec -n 2 nrniv -mpi test0.hoc
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

numprocs=1
numprocs=1
I am 0 of 1
I am 0 of 1

(Sorry for the information overload!)

ramcdougal commented 10 months ago

That's very strange. That version was built by the CI scripts; I'd expect it to support MPI.

It shouldn't matter, but which MPI are you running? On Windows, we test with Microsoft MPI.

Other things that are strange: why is the NEURON banner printing when you launch via MATLAB with MPI? (If it does it without mpi, there's a flag in the c examples that shows how to disable it... But if it only does that with MPI, that's strange.) I had thought we always were disabling Python.

AljenU commented 10 months ago

On linux, I do get it to kinda work, but indeed something goes wrong during quit. Slightly earlier, it tries to access a file called cleanup at a location that does not exist on my computer.

I edited my linux_matlab.sh so at the last line it calls mpiexec -n 2 ${MY_MATLAB} -batch example_mpi

EDIT: after a little digging, I found that for me the cleanup is at /home/aljen.uitbeijerse/.conda/envs/neuron90a0/lib/python3.10/site-packages/neuron/.data/share/nrn/lib/, however how to tell this to neuron?

./doc/example_startup_scripts/linux_matlab.sh

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

NEURON -- VERSION 9.0.dev-1361-g98cad3ae4 HEAD (98cad3ae4) 2023-06-12
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

NEURON -- VERSION 9.0.dev-1361-g98cad3ae4 HEAD (98cad3ae4) 2023-06-12
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

numprocs=2
I am 1 of 2
I am 0 of 2
sh: 1: /root/nrn/build/cmake_install/share/nrn/lib/cleanup: Permission denied
sh: 1: /root/nrn/build/cmake_install/share/nrn/lib/cleanup: Permission denied
Warning: MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.
Warning: MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.
> In neuron.Session.call_func_hoc (line 234)
> In neuron.Session.call_func_hoc (line 234)
In neuron/Session/dynamic_call (line 117)
In neuron/Session/dynamic_call (line 117)
In indexing (line 156)
In example_mpi (line 10)
In indexing (line 156)
In example_mpi (line 10)
Warning: 'quit': number or type of arguments incorrect.
Warning: 'quit': number or type of arguments incorrect.
> In neuron.Session.call_func_hoc (line 235)
In neuron/Session/dynamic_call (line 117)
In indexing (line 156)
> In neuron.Session.call_func_hoc (line 235)
In example_mpi (line 10)
In neuron/Session/dynamic_call (line 117)
In indexing (line 156)
In example_mpi (line 10)
Warning: The following error was caught while executing 'neuron.Object' class
destructor:
Error using clib.neuron.hoc_obj_unref
MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.

Error in neuron.Object/delete (line 75)
            clib.neuron.hoc_obj_unref(self.obj);

Error in example_mpi (line 4)
    setup;
> In example_mpi (line 4)
Warning: The following error was caught while executing 'neuron.Object' class
destructor:
Error using clib.neuron.hoc_obj_unref
MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.

Error in neuron.Object/delete (line 75)
            clib.neuron.hoc_obj_unref(self.obj);

Error in example_mpi (line 4)
    setup;
> In example_mpi (line 4)
Error using clib.neuron.get_nrn_functions
MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.

Error in neuron.Session/fill_dynamic_props (line 28)
            arr = split(clib.neuron.get_nrn_functions(), ";");

Error in indexing (line 160)
                self.fill_dynamic_props();

Error in example_mpi (line 10)
    n.quit();

Error using clib.neuron.get_nrn_functions
MATLABCLibHost process for 'neuron' terminated unexpectedly. To reload
interface library, first call "unload(clibConfiguration('neuron'))" and then
call function/class from interface library.

Error in neuron.Session/fill_dynamic_props (line 28)
            arr = split(clib.neuron.get_nrn_functions(), ";");

Error in indexing (line 160)
                self.fill_dynamic_props();

Error in example_mpi (line 10)
    n.quit();

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[48985,1],0]
  Exit code:    1
--------------------------------------------------------------------------
edovanveen commented 10 months ago

We should change this warning, because it is incorrect and very confusing:

'quit': number or type of arguments incorrect.

Instead, something like:

'quit': error during call to Neuron function

(or something like that)

edovanveen commented 10 months ago

That's very strange. That version was built by the CI scripts; I'd expect it to support MPI.

It shouldn't matter, but which MPI are you running? On Windows, we test with Microsoft MPI.

Other things that are strange: why is the NEURON banner printing when you launch via MATLAB with MPI? (If it does it without mpi, there's a flag in the c examples that shows how to disable it... But if it only does that with MPI, that's strange.) I had thought we always were disabling Python.

I am using Intel(R) MPI Library for Windows* OS, Version 2021.8 Build 20221129

ramcdougal commented 10 months ago

Isn't this also strange because we don't have any ability to verify the number of arguments that a NEURON function expects? So where would that message come from at all?

But maybe one option is to just have MATLAB do the quitting whenever n.quit() is invoked?

edovanveen commented 10 months ago

Great news! If I run this, it works out of the box:

c:\nrn\bin\mpiexec.exe -n 4 matlab -batch example_mpi

But maybe one option is to just have MATLAB do the quitting whenever n.quit() is invoked?

I like that idea!

My output on windows now:

>c:\nrn\bin\mpiexec.exe -n 4 matlab -batch example_mpi

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

        Event Support License -- for demonstration use and event support.  Not for government,
        research, commercial, or other organizational use.

NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 0 of 4
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 2 of 4
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 1 of 4
NEURON -- VERSION 9.0.dev-1329-g4b26ff135+ HEAD (4b26ff135+) 2023-03-30
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2022
See http://neuron.yale.edu/neuron/credits

I am 3 of 4
numprocs=4
AljenU commented 10 months ago

We are working on some final tweaks to close nicely, without a warning that MPI was terminated unexpectedly.

Also, for Linux, I could get rid of the crash because the cleanup file could not be found, by adding a line in the linux_matlab.sh: export NEURONHOME="${NRNML_NRNPATH}share/nrn". However, I'm not sure that that is also the correct path when neuron is installed directly, instead of through conda.

ramcdougal commented 10 months ago

Continuing the theme of my confusion: I don't know why there would be a version of mpiexec inside the nrn folder, but I'm glad it works.

edovanveen commented 10 months ago

To do @AljenU : try to run on linux, noninteractively, in-process, with n.quit()