sccn / amica

Code for AMICA: Adaptive Mixture ICA with shared components
BSD 2-Clause "Simplified" License
23 stars 13 forks source link

correct way to call mpirun with AMICA on multiple machines #35

Closed vlawhern closed 2 years ago

vlawhern commented 2 years ago

Hello,

I'm currently trying to work with AMICA on a distributed cluster (not using a job scheduler, but machines that can connect to each other via SSH) on the command line, so outside of MATLAB. I'm working with the sample data for now for testing purposes (Memorize.fdt, amicadefs.param).

Suppose I have a hostfile with two hosts defined as

host1 slots=1
host2 slots=1

when I run this command

mpirun -np 2 --cpus-per-proc 10 -machinefile hostfile amica15ub amicadefs.param

it does spin up two processes successfully, one on each host. However, it appears it's just running 2 separate AMICA runs (as opposed to running 1 AMICA run and sharing information across the processes); for example, see the following output

 iter     1 lrate =  0.1000000000 LL =  -2.4660826212 nd =  0.0396225707, D =   0.21150E-01  0.21150E-01  (  1.15 s,   0.6 h)
 iter     1 lrate =  0.1000000000 LL =  -2.4647405397 nd =  0.0393720804, D =   0.20560E-01  0.20560E-01  (  1.00 s,   0.6 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4083273442 nd =  0.0139898858, D =   0.15982E-01  0.15982E-01  (  0.98 s,   0.5 h)
 iter     2 lrate =  0.1000000000 LL =  -2.4085044656 nd =  0.0135491076, D =   0.15504E-01  0.15504E-01  (  1.17 s,   0.6 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3987433791 nd =  0.0127810935, D =   0.12500E-01  0.12500E-01  (  1.20 s,   0.7 h)
 iter     3 lrate =  0.1000000000 LL =  -2.3988822199 nd =  0.0124826162, D =   0.12684E-01  0.12684E-01  (  0.96 s,   0.5 h)

Is there something I'm missing in order to do distributed AMICA runs correctly?

Thanks in advance

vlawhern commented 2 years ago

OK so in further testing this is an OpenMPI vs MPICH issue.. so I'm closing the ticket. Thanks.