suny-downstate-medical-center / netpyne

A Python package to facilitate the development, parallel simulation, optimization and analysis of multiscale biological neuronal networks in NEURON.
http://www.netpyne.org
MIT License
144 stars 135 forks source link

Problem running NeuroML imported models in parallel #235

Closed pgleeson closed 7 years ago

pgleeson commented 7 years ago

If I try to run the model here: https://github.com/OpenSourceBrain/NetPyNEShowcase/blob/master/NetPyNE/test/LEMS_SimpleNet_netpyne.py

in serial or with np1:

 mpiexec -np  1   nrniv -mpi LEMS_SimpleNet_netpyne.py

it works fine. But with

 mpiexec -np  2   nrniv -mpi LEMS_SimpleNet_netpyne.py

it throws an error (using my branch):

...
Finished import of NeuroML2; populations vs gids NML has calculated: OrderedDict([('RS_pop', [0, 1, 2])])
Finished import of NeuroML2; populations vs gids NML has calculated: OrderedDict([('RS_pop', [0, 1, 2])])

Creating network of 1 cell populations on 2 hosts...
  Number of cells on node 0: 2 
  Done; cell creation time = 0.00 s.
  Number of connections on node 0: 0 
Adding stims...
  Number of cells on node 1: 1 
  Number of connections on node 1: 0 
  Number of stims on node 0: 6 
  Number of stims on node 1: 3 
  Done; cell stims creation time = 0.00 s.
Recording 2 traces of 2 types on node 1
Recording 4 traces of 3 types on node 0

Running simulation for 500.0 ms...
  Done; run time = 0.02 s; real-time ratio: 31.82.

Gathering data...
<<<<<<< Data on host 1
<<<<<<< Data on host 0
{'netCells': [{stims: [{'NeuroML2_stochastic_input_rand': '---Removed_NeuroML_obj---'}, {'NeuroML2_stochastic_input_rand': '---Removed_NeuroML_obj---'}, {'loc': 0.5, 'label': 'Stim0_1_RS_pop_1_soma_0_5', 'source': 'Stim0_1_RS_pop_1_soma_0_5', 'sec': 'soma', 'stim_count': 1, 'originalFormat': 'NeuroML2_stochastic_input', 'hpoissonFiringSyn': None, 'type': 'poissonFiringSyn'}], tags: {'cellLabel': 0, 'xnorm': 0.966453535692, 'pop': 'RS_pop', 'label': ['RS'], 'cellType': 'RS', 'znorm': 0.007491470058589999, 'propList': [], 'y': 44.0732599175, 'x': 96.6453535692, 'cellModel': 'RS', 'z': 0.749147005859, 'ynorm': 0.440732599175}, secLists: {}, secs: {'soma': {'hSec': None, 'geom': {'diam': 10, 'L': 10, 'cm': 31.830988618379067}, 'pointps': {'RS': {'hPointp': None, 'mod': 'RS'}}}}, gid: 1, conns: []}],
 'netPopsCellGids': {'RS_pop': [1]},
 'simData': {spkt: <hoc.HocObject object at 0x7ff5c2fe1108>, stims: {cell_1: {}}, Volts_file__RS_pop_RS_pop_2_soma_v: {}, spkid: <hoc.HocObject object at 0x7ff5c2fe1b28>, Volts_file__RS_pop_RS_pop_0_soma_v: {cell_1: <hoc.HocObject object at 0x7ff5c2fe1810>}, t: <hoc.HocObject object at 0x7ff5c2fe1930>, Volts_file__RS_pop_RS_pop_1_soma_v: {}}}
>>>>>>>> End data on host 1
{'netCells': [{stims: [{'NeuroML2_stochastic_input_rand': '---Removed_NeuroML_obj---'}, {'NeuroML2_stochastic_input_rand': '---Removed_NeuroML_obj---'}, {'loc': 0.5, 'label': 'Stim0_0_RS_pop_0_soma_0_5', 'source': 'Stim0_0_RS_pop_0_soma_0_5', 'sec': 'soma', 'stim_count': 0, 'originalFormat': 'NeuroML2_stochastic_input', 'hpoissonFiringSyn': None, 'type': 'poissonFiringSyn'}], tags: {'cellLabel': 0, 'xnorm': 0.966453535692, 'pop': 'RS_pop', 'label': ['RS'], 'cellType': 'RS', 'znorm': 0.007491470058589999, 'propList': [], 'y': 44.0732599175, 'x': 96.6453535692, 'cellModel': 'RS', 'z': 0.749147005859, 'ynorm': 0.440732599175}, secLists: {}, secs: {'soma': {'hSec': None, 'geom': {'diam': 10, 'L': 10, 'cm': 31.830988618379067}, 'pointps': {'RS': {'hPointp': None, 'mod': 'RS'}}}}, gid: 0, conns: []},
              {stims: [{'NeuroML2_stochastic_input_rand': '---Removed_NeuroML_obj---'}, {'NeuroML2_stochastic_input_rand': '---Removed_NeuroML_obj---'}, {'loc': 0.5, 'label': 'Stim0_2_RS_pop_2_soma_0_5', 'source': 'Stim0_2_RS_pop_2_soma_0_5', 'sec': 'soma', 'stim_count': 2, 'originalFormat': 'NeuroML2_stochastic_input', 'hpoissonFiringSyn': None, 'type': 'poissonFiringSyn'}], tags: {'cellLabel': 1, 'xnorm': 0.910975962449, 'pop': 'RS_pop', 'label': ['RS'], 'cellType': 'RS', 'znorm': 0.582227573059, 'propList': [], 'y': 93.9268997364, 'x': 91.0975962449, 'cellModel': 'RS', 'z': 58.2227573059, 'ynorm': 0.939268997364}, secLists: {}, secs: {'soma': {'hSec': None, 'geom': {'diam': 10, 'L': 10, 'cm': 31.830988618379067}, 'pointps': {'RS': {'hPointp': None, 'mod': 'RS'}}}}, gid: 2, conns: []}],
 'netPopsCellGids': {'RS_pop': [0, 2]},
 'simData': {spkt: <hoc.HocObject object at 0x7fa24490ca98>, stims: {cell_2: {}, cell_0: {}}, Volts_file__RS_pop_RS_pop_2_soma_v: {}, spkid: <hoc.HocObject object at 0x7fa24490c7c8>, Volts_file__RS_pop_RS_pop_0_soma_v: {cell_0: <hoc.HocObject object at 0x7fa24490cc90>}, t: <hoc.HocObject object at 0x7fa24490c348>, Volts_file__RS_pop_RS_pop_1_soma_v: {cell_2: <hoc.HocObject object at 0x7fa24490c198>}}}
>>>>>>>> End data on host 0
  Done; gather time = 0.04 s.

Analyzing...
  Cells: 3
  Connections: 0 (0.00 per cell)
  Spikes: 7 (4.67 Hz)
  Simulated time: 0.5 s; 2 workers
  Run time: 0.02 s
Finished simulation
  Done; saving time = 0.00 s.
  Done; plotting time = 0.00 s

Total time = 0.06 s
Finished simulation
Saving to file: Sim_SimpleNet.RS_pop.v.dat (ref: Volts_file__RS_pop)
Traceback (most recent call last):
  File "LEMS_SimpleNet_netpyne.py", line 121, in <module>
    col_Volts_file__RS_pop_v_RS_pop_1_RS_v = sim.allSimData['Volts_file__RS_pop_RS_pop_1_soma_v']['cell_%s'%gids['RS_pop'][1]]
KeyError: 'cell_1'

It looks like the cells aren't being properly recorded: cell_1 (in pop1) is going into Volts_file__RS_pop_RS_pop_0_soma_v for pop0, even though the following conditions are specified:

simConfig.recordTraces['Volts_file__RS_pop_RS_pop_0_soma_v'] = {'sec':'soma','loc':0.5,'var':'v','conds':{'pop':'RS_pop','cellLabel':0}}
simConfig.recordTraces['Volts_file__RS_pop_RS_pop_1_soma_v'] = {'sec':'soma','loc':0.5,'var':'v','conds':{'pop':'RS_pop','cellLabel':1}}
simConfig.recordTraces['Volts_file__RS_pop_RS_pop_2_soma_v'] = {'sec':'soma','loc':0.5,'var':'v','conds':{'pop':'RS_pop','cellLabel':2}}
salvadord commented 7 years ago

@pgleeson - ok fixed small bug when distributing cells of cellList and works ok now -- let me know; and thanks for finding bug

pgleeson commented 7 years ago

@salvadord, great, thanks. That works for me locally too.

Now another minor problem.. Simulations in NML2 using just chemical synapses used to produce identical behaviour when run in serial and parallel mode the first time I updated for the Random123 usage, now there are small differences.

Code is here https://github.com/OpenSourceBrain/NetPyNEShowcase/blob/master/NetPyNE/test/LEMS_SpikingNet_netpyne.py.

These are first trace from Sim_SpikingNet.pop_post.v.dat that's produced (cell 0 in post syn cell), run with 1,2,4 processors:

selection_296

close up:

selection_294

Presynaptic cells seem to be identical

salvadord commented 7 years ago

@pgleeson - ok, took a while but figured what the issue is, although not sure exactly how to solve it:

With the above changes the output was identical when using different num of mpi nodes. However, I didn't commit since wasn't sure exactly how you wanted to handle the 'NeuroML2_stochastic_input' in neuromlFuncs.py

Let me know if this makes sense and if you need me to make any changes. thx

pgleeson commented 7 years ago

Thanks for looking into that @salvadord. I've made some updates based on your suggestions in https://github.com/Neurosim-lab/netpyne/commit/c9d00f954c22c150f962ff4616f6f57d2a585631, but I think this wasn't really the source of the error, I'm still seeing the small differences myself...

It shouldn't have fixed the problem anyway, because the Randoms were already being set up/initialised correctly, since the presynaptic population (receiving the spiking inputs) was identical on 1,2,4 hosts, it was just the slight difference in the post synaptic pop (just receiving syn input from pre pop).

I'll look into it more and see if I can narrow down where the issue is...

salvadord commented 7 years ago

Ah sorry thought we were still looking at SimpleNet, will check SpikingNet.

However, I think the above still does apply to SimpleNet: I pulled your latest changes from neuroml_export and the h.Random for the syn where still not being created, so got different output for 1 vs 2 cores -- does it work ok for you?

salvadord commented 7 years ago

Just checked LEMS_SpikingNet and the presyn pops in 1 vs 2 hosts are not identical for me

salvadord commented 7 years ago

@pgleeson, FYI I was able to reproduce the spkingnet issue with just 2 stims, 2 presyn cells and 1 postsyn cell; I'm implementing the net directly in netpyne (without neuroml export) to check what is causing the discrepancy.

pgleeson commented 7 years ago

@salvadord thanks! I'll try to get more time to look into this more from my side too this week.

salvadord commented 7 years ago

@pgleeson - I've been able to reproduce the issue with a minimal example in netpyne, without any neuroml, and using standard hh cells and exp2syn. But after many hours still can't figure out what is causing it. My next step is implementing the same thing directly in Neuron.

pgleeson commented 7 years ago

Thanks for spending some time to look into this @salvadord. I'm reasonably sure I was able to run some networks identically in serial and parallel mode before the Random123 refactor, but didn't test it very extensively.

I'm seeing differences with this example https://github.com/OpenSourceBrain/NetPyNEShowcase/blob/master/NeuroML2/times/LEMS_SpikingNet_netpyne.py between serial and np = 2 (pre pop has 3 cells, post has 2, 2 conns: pre0->post0, pre1-> post1)

selection_307 selection_308

But when I remove the 3rd cell in pre pop the difference vanishes...

selection_306

salvadord commented 7 years ago

@pgleeson - I tested a bunch of the tutorials and they produce identical output with 1 and 2 cores, even with the new Random123. And yeah, I started from the LEMS_SpikingNet_netpyne.py example, rewrote it in netpyne that involved no NeuroML, and simplified it step by step until I was able to reproduce the issue with a minimal model (just 2 pre and 1 pop cells) that used standard HH neurons and Exp2syn. I've pushed the code to the sandbox: https://github.com/Neurosim-lab/netpyne/blob/development/examples/sandbox/sandbox.py Will look into it more this week.

salvadord commented 7 years ago

@pgleeson - Finally figured it out!!

So it turns out that that gid_connect modifies the threshold of the source cell but only if the cell is on the same node. This leads to different outputs if running on 1 vs 2 cores.

You can see a minimal example using just Neuron in the sandbox: https://github.com/Neurosim-lab/netpyne/blob/development/examples/sandbox/sandbox.py -- notice the threshold of cell with gid=1 is different if you run on 1 vs 2 cores.

A related issue was discussed in this forum post: https://www.neuron.yale.edu/phpBB/viewtopic.php?f=31&t=2355. Seems like a possible solution would be to never use the threshold of the postyn cell netcon (via pc.gid_connect) and instead use the one of the presyn cell netcon (via pc.cell). I discussed with Robert, and emailed Mike Hines about it.

For now I changed netpyne so the threshold in conns is not used; and so you can provide a 'threshold' param in cellParams eg. 'secs': {'soma': {'threshold': 5.0}.

I also made the previously mentioned minor change in neuromlFuncs.py so that format = 'NeuroML2_stochastic_input' is enforced and an h.Random is associated with the stim.

After these 2 changes the LEMS_SpikingNet example produces identical output in diff num of cores (finally!).

I'll wait to see if you want to introduce any changes in neuromlFuncs.py and then will release version Saturday night or Sunday morning.

pgleeson commented 7 years ago

Great news! Have made some updates to use threshold on the sections, and opened a PR. Please include in the release :-)

I've also put back the check on 'NeuroML2_stochastic_input'; there are two types of NML 'cells', one is a spike source and one is a cell with v and they have to be treated slightly differently. All my tests are passing now. This change shouldn't affect anything outside the NML world...