hpc / Spindle

Scalable dynamic library and python loading in HPC environments
Other
96 stars 23 forks source link

Missing mpi.h in testsuite and request for clarification on MPI variant use with Spindle #41

Closed alansill closed 3 years ago

alansill commented 3 years ago

A plain spack installation or manual installation without an active MPI variant defined throws an error with a missing mpi.h in the testsuite, as below:

     352    make[3]: Entering directory '/tmp/asill/spack-stage/spack-stage-spindle-0.8.1-av65uymhbjk5xlot4r7o7zrdplcrathu/spack-src/testsuite'
     353      CC     test_driver-test_driver.o
     354      CC     test_driver_libs-test_driver.o
  >> 355    test_driver.c:17:10: fatal error: mpi.h: No such file or directory
     356     #include <mpi.h>
     357              ^~~~~~~
     358    compilation terminated.
  >> 359    test_driver.c:17:10: fatal error: mpi.h: No such file or directory
     360     #include <mpi.h>
     361              ^~~~~~~
     362    compilation terminated.
     363    make[3]: *** [Makefile:340: test_driver-test_driver.o] Error 1
     364    make[3]: *** Waiting for unfinished jobs....
     365    make[3]: *** [Makefile:356: test_driver_libs-test_driver.o] Error 1

Does Spindle require a specific MPI package to be set up to address the missing mpi.h and if so, is a separate Spindle instance required for each MPI variant to be used? We have many MPI variants in use, of course, so the latter would definitely be a hassle to use, but I suspect I am missing something obvious here.

mplegendre commented 3 years ago

Spindle only requires MPI for its testsuite, which tries to launch MPI jobs and ensure they run correctly in Spindle. If you add 'mpicc' to your PATH when you configure then Spindle should find that and use it in the testsuite's build. Alternatively, you could disable the testsuite in your configure line with --enable-testsuite=no.

I had thought a not-found MPI at configure time would disable the testsuite build, but that doesn't look to be happening. I'll leave this issue to look into that.

vsoch commented 3 years ago

To pick up on this issue, I've compiled spindle with tests and a template:

./configure --with-munge-dir=/etc/munge --enable-sec-munge --with-slurm-dir=/etc/slurm --with-testrm=slurm
make
make install

And I've tried that with both slurm and openmpi as the "testrm" And then I make the tests

cd testsuite
make
./runTests

but no matter what I do (using the slurm or openmpi template, both of which I have) I see this error:

Running: ./run_driver --partial --session
ERROR: Spindle could not connect to session tn2VYQ

I saw this same error in trying to just use spindle so I've gone back to the tests to debug.

mplegendre commented 3 years ago

@vsoch -- Did you mean to post this comment to the container demo PR?

vsoch commented 3 years ago

It was related moreso to #39 but that issue was closed in favor of this one so I opened here.

mplegendre commented 3 years ago

I think this is a different issue. I'd suggest opening a new issue.

This looks like an internal spindle error. The code around this error is trying to set up a named pipe in /tmp for communicating between two processes. Could the system be missing a /tmp area?

vsoch commented 3 years ago

Sure will do!

And that could be it - I do have a /tmp area, but I didn't create a shared /tmp area for my containers and I think that might be needed. I'll try that, and if the issue is still there will open a new one.

mplegendre commented 3 years ago

Fixed the original testsuite build issue in the 'devel' branch.