wmglab-duke / cajal

Construct models of peripheral nerve stimulation in NEURON and perform optimization on MPI-enabled frameworks.
Other
3 stars 0 forks source link

> mpirun -n 4 python parallel_thresholds.py get random results #3

Open zeyunZhao opened 1 week ago

zeyunZhao commented 1 week ago

I followed your guiudance and installed the totally same environments. But I got random results when I run parallel_thresholds.py in32 CPUs many times (about 2/10 are abnormal results). Is it normal or not?

Any reply will be appreciated.

image

minhajh commented 2 days ago

This looks like an MPI error. How many physical cores do you have? Does this only occur when you oversubscribe (i.e. mpirun -n 'x' where x > number of physical cores)?

zeyunZhao commented 2 days ago

Thanks for your reply. I have 32 CPUs. This can happen when I run the parallel_thresholds.py with 4, 8, 16 and 32 cores. I a m trying to run them in 1 CPU. image

zeyunZhao commented 2 days ago

It seems like that I can get the same results 10 times when I run this code in 1 CPU

minhajh commented 1 day ago

Ah I see. I assumed you were running the file parallel_thresholds.py multiple times, not executing your own code in a for loop. If you check parallel_thresholds.py, you'll notice that the plotting code comes after a check statement: if MPI.MASTER(): ...plotting code etc.. This is not by accident. The thresholds are only reassembled on the 'master' rank - on the other ranks, env.thresholds is an uninitialized array. If you run your code so that only the master rank generates the plot, i.e.

for k in range(10):
    N.tstop = 5 * ms
    N.dt = 0.005 * ms
    mrg = MRG etc...
    if MPI.MASTER():
        plt.plot(...)
        plt.savefig(...)

then it should work as expected.

zeyunZhao commented 1 day ago

Yes, I have tried this ten times and got the same results. Does it mean that cajal can run normally? Then why I cannot generate data in axonml well ? The data generated in axonml always have nan. Could you help me with that?