suny-downstate-medical-center / netpyne

A Python package to facilitate the development, parallel simulation, optimization and analysis of multiscale biological neuronal networks in NEURON.
http://www.netpyne.org
MIT License
142 stars 134 forks source link

[bug] Not all netpyne submodules respect the `netpyne.__gui__` variable, making netpyne currently unusable on hpcs #786

Closed sanjayankur31 closed 8 months ago

sanjayankur31 commented 8 months ago

I'm trying to run the NeuroML conversion of Human L23 model on NSG using NetPyNE. The generation of the NetPyNE script etc. goes correctly, the mod files are also generated and compiled fine. However, when the python script is run on NSG, it errors because matplotlib is also imported by netpyne and it cannot acquire the lock file, probably because all the N mpi processes are trying to acquire it.

Would you folks have experience with running sims on NSG using NetPyNE? Is it necessary for NetPyNE to import matplotlib even if no analysis/plotting is to be done?

Here's a zip of the python script and mod files: LEMS_HL23_0.01_Sim_NSG.zip

The complete error output from the nsg job is here: stderr.txt

salvadord commented 8 months ago

NetPyNE models were working on NSG in the past. We've also had matplotlib issues when running on HPCs generally. Something you could try is running with the '-nogui' flag. eg. python init.py -nogui

sanjayankur31 commented 8 months ago

Thanks @salvadord . I added that to the nsg config, and it seems to be passed correctly, but it isn't being picked up by netpyne (which is still trying to import matplotlib):

$ rg nogui
_COMMANDLINE
1:inputfile python3 -nogui

nsgdebug
1:argv (['/expanse/projects/nsg/home/nsguser/ngbwr.expanse/contrib/scripts/submit.simple.py', '--account', 'csd403', '--url', '-k -K ~/.jobcurl.rc https://nsgr.sdsc.edu:8443/cipresrest/v1/admin/updateJob?taskId=26967\\&jh=NGBW-JOB-OSBv2_EXPANSE_0_7_3-A96537A9F14447F2A24135C584316D25', '--', 'inputfile python3 -nogui '])

batch_command.cmdline
19:mpirun --mca orte_base_help_aggregate 0 -np 128 singularity exec --bind '/expanse/projects/nsg/external_users/public:/mnt/publicglobus:ro,/expanse/projects/nsg/home/nsguser/ngbwr.expanse/workspace/PERSISTENT/ankursinha_nsg_persistent,/expanse/projects/nemar/openneuro:/expanse/projects/nemar/openneuro:ro,/expanse/projects/nsg/home/nsguser/kennethtest/singularity/openmpi.slurm.osbv2/usr.local.openmpi.slurm:/usr/local:ro,/expanse/projects/nsg/home/nsguser/ngbwr.expanse/workspace/NGBW-JOB-OSBv2_EXPANSE_0_7_3-A96537A9F14447F2A24135C584316D25' -H '/expanse/projects/nsg/home/nsguser/ngbwr.expanse/workspace/NGBW-JOB-OSBv2_EXPANSE_0_7_3-A96537A9F14447F2A24135C584316D25/LEMS_HL23_0.005_Sim_NSG' /expanse/projects/nsg/home/nsguser/kennethtest/singularity/openmpi.slurm.osbv2/openmpi.slurm.sif /usr/local/python/runit python3 'LEMS_HL23_0.005_Sim_netpyne.py' -nogui 

Not sure what the issue is at the moment. Maybe the the way the commands are being passed means that -nogui is no longer a separate entry in sys.argv so sys.argv.count('-nogui') > 0 doesn't apply? I'll go run a test script to see what sys.argv is.

(I also see a space after -nogui but I wouldn't expect that to affect sys.argv.)

sanjayankur31 commented 8 months ago

this is the relevant bit of the stacktrace:

 248   │   File "/expanse/projects/nsg/home/nsguser/ngbwr.expanse/workspace/NGBW-JOB-OSBv2_EXPANSE_0_7_3-DF9BE46E6CE547C596E789919A090041/LEMS_HL23_0.005_Sim_NSG/LEMS_HL23_0
       │ .005_Sim_netpyne.py", line 86, in <module>
 249   │     from netpyne import specs  # import netpyne specs module
 250   │   File "/usr/local/python/venv/lib/python3.9/site-packages/netpyne/__init__.py", line 30, in <module>
 251   │     from netpyne import specs  # import netpyne specs module
 252   │   File "/usr/local/python/venv/lib/python3.9/site-packages/netpyne/__init__.py", line 30, in <module>
 253   │     from netpyne import analysis
 254   │   File "/usr/local/python/venv/lib/python3.9/site-packages/netpyne/analysis/__init__.py", line 32, in <module>
 255   │     from netpyne import analysis
 256   │   File "/usr/local/python/venv/lib/python3.9/site-packages/netpyne/analysis/__init__.py", line 32, in <module>
 257   │     from ..plotting import plotShape
 258   │   File "/usr/local/python/venv/lib/python3.9/site-packages/netpyne/plotting/__init__.py", line 18, in <module>
 259   │     from ..plotting import plotShape
 260   │   File "/usr/local/python/venv/lib/python3.9/site-packages/netpyne/plotting/__init__.py", line 18, in <module>
 261   │     from .plotter import MetaFigure, GeneralPlotter, ScatterPlotter, LinePlotter, HistPlotter, ImagePlotter
 262   │   File "/usr/local/python/venv/lib/python3.9/site-packages/netpyne/plotting/plotter.py", line 7, in <module>
 263   │     from .plotter import MetaFigure, GeneralPlotter, ScatterPlotter, LinePlotter, HistPlotter, ImagePlotter
 264   │   File "/usr/local/python/venv/lib/python3.9/site-packages/netpyne/plotting/plotter.py", line 7, in <module>
 265   │     import matplotlib.pyplot as plt

I see that the imports here don't take the value of netpyne.__gui__ into account, and that's probably causing the issue here:

https://github.com/suny-downstate-medical-center/netpyne/blob/f73d541267b59dfeb56ad6bc4bc3012f77b6b566/netpyne/plotting/plotter.py#L6