Open Hjorthmedh opened 4 months ago
@adamjhn or @ramcdougal: do you have a suggestion for this issue?
Looking into this...
First two insights:
nodes
is a slight red-herring here, as there's nothing special about the 0th species... it's just that it's the first one that happens to get called. Every call after the first one should be proportional to the number of nodes in that cell (not in the model as a whole)... these run in about 0.16
ms on my machine (with c91662
in the below slightly modified version). The problem is that initial call sets up the data structures (this is the job of _update_node_data
) for the entire portion of the simulation in the current process (has to happen sometime) and that's the part that's O(number nodes in process)... Edited to give more detailsfinitialize
runs in about 12 seconds on my machine, which isn't great but isn't overly egregious. The problem is the massive number of calls to the destructor that are removing one section at a time and reallocating memory following the end of the function. Addendum edit: Note part of the issue is that our simulation is wrapped in the function minimal_example
, so there's no way for NEURON to know we're done and aren't deleting things bit by bit. If the simulation was at the top level, the atexit
routines would be called which provide a much faster shutdown.Quite frankly, this is probably the first time anyone has tried to run this with 2000 species.
Here's a version reproducing the problem that doesn't require bluepyopt:
from neuron import h, rxd
import time
# I'm using c91662 from
# https://raw.githubusercontent.com/NeuroBox3D/NeuGen/master/NeuGen/cellData/CA1/amaral/c91662.CNG.swc
h.load_file("import3d.hoc")
class Cell:
def __init__(self):
cell = h.Import3d_SWC_read()
cell.input("c91662.swc")
i3d = h.Import3d_GUI(cell, False)
i3d.instantiate(self)
def minimal_example(NUM_MORPHS=10, SPECIES_PER_CELL=200):
species_list = []
region_list = []
cell_list = [Cell() for _ in range(NUM_MORPHS)]
for cell in cell_list:
print("Creating regions", flush=True)
region = rxd.Region(cell.dend, nrn_region="i")
region_list.append(region)
print("Creating species", flush=True)
for idx in range(SPECIES_PER_CELL):
species_name = f"species{idx}"
spec = rxd.Species(region,
d=0,
initial=1,
charge=0,
name=species_name)
species_list.append(spec)
duration = []
for idx, spec in enumerate(species_list):
# This step is slow
if idx == 0:
print(f"Calling nodes on {spec} -- This is slow!", flush=True)
else:
print(f"Calling nodes on {spec}", flush=True)
start_time = time.perf_counter()
spec.nodes
end_time = time.perf_counter()
dur = end_time - start_time
duration.append(dur)
print(f"nodes call done {dur}")
print(f"Max duration: {max(duration)} for {NUM_MORPHS} neurons")
print("Init")
start_time = time.perf_counter()
h.finitialize(-65)
end_time = time.perf_counter()
print(f"Initialization time: {end_time - start_time} seconds")
if __name__ == "__main__":
minimal_example()
Hej! Do you have any suggestions for how to solve this?
A couple quick tips for now:
finitialize
; it's what happens after the function ends.)Short-term solutions we can help with:
Longer-term fixes we should do:
Thanks! We made some changes to retrieve nodes after all species were defined, but this still left the first call to .nodes
too slow to be used at scale. It also seem to leave the time complexity for a single call proportional to the global number of nodes instead of only the number present in the individual cell. How complicated would it be to fix this at a lower-level? (It could be useful to have a tree representation or similar of the nodes at the c++ level instead of fetching continuously fetching Python lists.)
As for the deallocation, it seems to not happen when all species are involved in rxd.Reactions, so that's not been a problem in practice.
Hi,
I noticed another issue. Segment geometry is slow to compute and is computed multiple times per segment.
ncalls tottime percall cumtime percall filename:lineno(function)
396000 17.895 0.000 32.157 0.000 geometry.py:68(result) <- This is "surfacearea1d"
396000 11.753 0.000 26.164 0.000 geometry.py:36(_volumes1d)
396000 1.869 0.000 14.023 0.000 geometry.py:117(_neighbor_areas1d)
RxD.Section1D
and there is one of those for every species present on a section. The RxD module does not reuse the geometry calculations when there are multiple species, instead it recomputes the geometry. I think that the best way to fix this issue would be to compute the segment volume and surface area in C++ inside of neuron. The new SoA data structures can easily accomidate this. The geometry could be optional and lazily computed.
Context
The call
myspecies.nodes
to get all compartments that havemyspecies
is very slow. The callh.finitialize()
is also very slow.Overview of the issue
In our minimal example it takes 11 seconds for a single call, with one neuron, and 110 seconds for a single call, in a network of 10 neurons. This appears to scale linearly with the number of neurons even though we are only requesting a list of compartments from a single neuron.
h.finitialize()
is also incredibly slow.[Provide more details regarding the issue]
We expected the function call to be much faster, and be independent of the number of neurons. This is especially important since we want to run large scale networks of neurons (10000+).
NEURON setup
Minimal working example - MWE
This example uses the morphologies in https://github.com/Hjorthmedh/BasalGangliaData/tree/main/data/neurons/striatum/dspn (The code uses glob to extract swc files from /morphology/.swc)
Logs
The init call takes AGES. Included below is the output from the python profiler.
This function call is done excessively, taking up 86.7% of the total run time! (Mostly during initialize?)
The
update_node_data
is also run excessively, especially during themyspecies.nodes
call