Closed ikitayama closed 3 years ago
If it'd be difficult now to support pthreads in the NEURON simulator framework, I'd try to use multiple nodes with MPI.
IMO, pthread should work as before. But about the specific error above (NetCon and NetCon source with no gid are not in the same thread
), Michael might remember more precisely as this was added few months ago.
Using the default value it works.
That the NetCon and it's NetStim source are not in the same thread is not a problem for NEURON. And I don't believe it is a problem for direct mode interaction between NEURON and CoreNEURON. The problem arises with file transfer mode where the NetCon and the NetStim will appear in different files and so, in principle, could end up on different MPI nodes. In the different MPI node case, without a gid associated with the NetStim, it is not possible to make the connection on the CoreNEURON side. One work around is to associate the NetStim with a gid. At the implementation level, it would be necessary for the file transfer code to invent a unique gid for this case.
For many-core systems like JURECA-DC (128 AMD cores per node), which I am currently using, is it better to use pthreads over the flat MPI scheme?
@pramodk is going to have to weigh in on this. Assuming we are talking about CoreNEURON, in some ways I have lost the conceptual thread with regard to performance of the openmp threads and would have to be reminded whether pthreads are even being used. On cpus it is hard to reach the performance of MPI using pthreads but I think we came very close with NEURON (during a simulation run) with the benefit of threads being less memory use and full gui interactivity. One cost of threads is the loss of interpreter parallelism during model setup. I guess, for me, the answer would be that performance comparisons are an experimental question.
At the implementation level, it would be necessary for the file transfer code to invent a unique gid for this case.
I believe the typical case is a NetStim connecting to a single cell. Perhaps it would be not too difficult to modify the nrncore_write implementation, for file transfer, to force non-gid sources with one connection onto the thread containing the NetCon
For many-core systems like JURECA-DC (128 AMD cores per node), which I am currently using, is it better to use pthreads over the flat MPI scheme?
@ikitayama : using threads make more sense especially with high core count system at scale. Above error seems like more as an artefact and could be avoided. We will take a look at this.
By the way, this ringtest tutorial uses pc.nrnbbcore_write
to dump model to file and then run CoreNEURON. Note that there is in-memory transfer mode as well that should avoid this error but with slightly higher memory footprint. You can see example here: https://github.com/neuronsimulator/nrn/blob/3f1f7316ae4c23c622b18f6b5fa88cebb99f534a/test/coreneuron/test_spikes.py#L53
We have to update this tutorial for this new option; another thing on todo list.
cc: @iomaganaris : could you add this on our todo list to have threading based test.
@pramodk Thanks for the pointer! While the code import mpi4py
however it does not seem to distribute neurons across the nodes, do you have other real-life examples?
I've been working on my script extended from ringtest.py with MPI support, but I keep getting the library error upon init.
I've been working on my script extended from ringtest.py with MPI support, but I keep getting the library error upon init.
This was due to my additions to the original Python script, now I am able to execute it on multiple compute nodes. As soon as pthreads "fix" is available, I'll rebuild the NEURON and test it. Let us know!
Can pthreads in the NEURON C code be replaced with OpenMP pragmas?
If not mistaken, I think profiling pthreads codes is a difficult task than against OpenMP ones.
Can pthreads in the NEURON C code be replaced with OpenMP pragmas?
I would guess that either could be supported without much change because all compute threads are activated with the style
nrn_multithread_job(pointer_to_function_taking_thread_id);
and there is no communication or access to same memory between them (there is an exception to this if the user arranges callbacks to the interpreter but those are serialized with mutex locks).
Alright. Thanks @nrnhines Suppose after pthreads issue is fixed in HEAD, am I able to use a hybrid pthreads+MPI execution mode?
@ikitayama : the goal with the current design is neuron will use pthreads but when module is executed by CoreNEURON then same number of OpenMP threads will be used on CoreNEURON side. So I think it won't be necessary to replace pthreads on NEURON side.
On the profiling side - as main compute is happening on CoreNEURON side, it's sufficient to just profile OpenMP part.
But we need some time to test and document these details.
I take it back; Score-P can handle Pthreads.
On Thu, Jan 21, 2021 at 17:16 Pramod Kumbhar notifications@github.com wrote:
@ikitayama https://github.com/ikitayama : the goal with the current design is neuron will use pthreads but when module is executed by CoreNEURON then same number of OpenMP threads will be used on CoreNEURON side. So I think it won't be necessary to replace pthreads on NEURON side.
On the profiling side - as main compute is happening on CoreNEURON side, it's sufficient to just profile OpenMP part.
But we need some time to test and document these details.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrnhines/ringtest/issues/14#issuecomment-764462969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHLBZ4BEDXO7CRNZKGJW2LS27PHBANCNFSM4WFYOOGQ .
When I run this python ringtest.py -nt=2 on JURECA-DC login, I get this message: NEURON: NetCon and NetCon source with no gid are not in the same thread
@ikitayama : I re-wrote README.md file and updated code. With the updated code, NEURON directly launches CoreNEURON (we call this in-memory mode). Using this, you won't see this error.
Also I added some explanation about using OpenMP threads. Let me know if that is helpful.
I believe this is fixed now. Please reopen in case of error.
When I run this
python ringtest.py -nt=2
on JURECA-DC login, I get this message:The code is basically at d3b40f9cd9