suny-downstate-medical-center / netpyne

A Python package to facilitate the development, parallel simulation, optimization and analysis of multiscale biological neuronal networks in NEURON.
http://www.netpyne.org
MIT License
142 stars 134 forks source link

NetPyNe with Optuna batch - mpiexec not starting nrniv [Bug report] #797

Open samnemo opened 7 months ago

samnemo commented 7 months ago

Describe the bug

When I run an Optuna batch optimization with the A1 model, mpiexec has trouble running the nrniv processes for the simulation. NetPyNe doesn't check the return calls from subprocess Popen and then waits indefinitely since the output is never produced. It seems that nrniv processes might get started but are put to sleep immediately.

This is using conda on Ubuntu with following relevant packages:

python Python 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

import netpyne netpyne.version '1.0.5' import neuron neuron.version '8.2.3' import optuna optuna.version '3.4.0'

Reproducing the bug

Steps to reproduce the behavior: Go to the A1 repo/branch here: https://github.com/NathanKlineInstitute/A1/tree/samn

Then run python batch.py

Expected behavior

I expected the mpiexec process to start nrniv properly, but nrniv fails to start. Running the mpiexec command directly runs simulations properly, but once using batch.py/NetPyNe batch with Optuna, nrniv does not start properly.

System information

See above

Additional context

Check with samn or James C for more details on reproducing the bug

samnemo commented 7 months ago

and this is the mpi version: mpiexec --version mpiexec (OpenRTE) 4.0.2

samnemo commented 7 months ago

in optuna_parallel.py the nrniv jobs seemed to be going to sleep/getting suspended putting a quit in the right place seemed to allow the later mpiexec with nrniv processes start properly

jobString = f"""#!/bin/bash
echo '{paramLabels}'
echo '{candidate}'
nrniv -python -c 'from neuron import h;soma = h.Section(name="soma");h.psection();quit()'
echo $?
mpiexec -n 48 nrniv -python -c 'from neuron import h;h.nrnmpi_init();pc=h.ParallelContext();print(pc.id())'
echo $?

{command}    
"""