tomMoral / dicodile

Experiments for "Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and Signals"
https://tommoral.github.io/dicodile/
BSD 3-Clause "New" or "Revised" License
18 stars 9 forks source link

Unit tests fail with mpich #19

Open hndgzkn opened 3 years ago

hndgzkn commented 3 years ago

Mandrill example runs without problems however unit tests fail with mpich.

When tests are run with:

$ pytest

Output is:

dicodile/tests/test_dicodile.py::test_dicodile [mpiexec@hande] match_arg (utils/args/args.c:160): unrecognized argument pmi_args
[mpiexec@hande] HYDU_parse_array (utils/args/args.c:175): argument matching returned error
[mpiexec@hande] parse_args (ui/mpich/utils.c:1603): error parsing input array
[mpiexec@hande] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1655): unable to parse user arguments
[mpiexec@hande] main (ui/mpich/mpiexec.c:128): error parsing parameters

This might be due to missing Singleton feature in mpich .

When tests are run with:

$ mpirun -np 1 pytest

When tests are run with mpirun command with openmpi implementation, all tests run without problems but it also hangs after running all the tests, leaving all processes spawned by the last test alive.

The problem with mpich seems to be valid only for the tests; for example examples/plot_mandrill.py runs without problems with mpich.

chmendoza commented 2 years ago

I am getting this when running examples/plot_gait.py:

[DEBUG:DICODILE] Lambda_max = 1.7133212673372127
[mpiexec@mendoza-PC] match_arg (utils/args/args.c:160): unrecognized argument pmi_args
[mpiexec@mendoza-PC] HYDU_parse_array (utils/args/args.c:175): argument matching returned error
[mpiexec@mendoza-PC] parse_args (ui/mpich/utils.c:1603): error parsing input array
[mpiexec@mendoza-PC] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1655): unable to parse user arguments
[mpiexec@mendoza-PC] main (ui/mpich/mpiexec.c:128): error parsing parameters

with mpich-3.4.2, mpi4py-3.1.1, and mpi-1.0-mpich.

hndgzkn commented 2 years ago

Hi @chmendoza

What is the command that you are using to run the example?

chmendoza commented 2 years ago

I don't have my laptop with me right now to recreate this and give more detail, but it was python plot_gait.py. I could post here more details about the conda environment if you think is needed.

hndgzkn commented 2 years ago

Please try with the command:

mpirun -np 1 --host localhost:16 python -m mpi4py examples/plot_gait.py

My guess is that you are trying to run the notebook without mpirun -np 1.. part of the command. It is possible with openmpi but not mpich. I hope this helps.

chmendoza commented 2 years ago

That worked @hndgzkn, thanks! Although I am using this package in my conda environment:

openmpi 4.1.1 hbfc84c5_0

so, I am not using mpich

Finally, I ran the tests, with $ pytest ., and 35 tests passed, with 10 warnings.