open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.15k stars 858 forks source link

Something odd about cuda.h configury checking #2984

Open jsquyres opened 7 years ago

jsquyres commented 7 years ago

@sjeaugey Per @siegmargross's post on the users mailing list, there's some kind of oddity with him not being able to find cuda.h / Cuda support in configury.

Siegmar's post is not currently listed in the mail archive (!) -- I think it's because he attached his config.log files (as we asked him to!), and mail-archive might have dropped the mail as an anti-virus/anti-spam measure (I've emailed their tech support asking about it -- see https://www.mail-archive.com/gossip@mail-archive.com/msg01555.html).

Here's the relevant output from Siegmar's email showing the cuda issue:

loki openmpi-master-201702080209-bc2890e-Linux.x86_64.64_gcc 155 grep cuda.h log.configure.Linux.x86_64.64_gcc
checking if --with-cuda is set... found (/usr/local/cuda/include/cuda.h)
checking cuda.h usability... no
checking cuda.h presence... no
checking for cuda.h... no
checking if MPI Extension cuda has C bindings... yes (required)
checking if MPI Extension cuda has mpif.h bindings... no
checking if MPI Extension cuda has "use mpi" bindings... no
checking if MPI Extension cuda has "use mpi_f08" bindings... no
loki openmpi-master-201702080209-bc2890e-Linux.x86_64.64_gcc 156

I'm filing this issue because Siegmar's config.log shows that it checks /usr/local/cuda/include for cuda.h (in the --with-cuda test), but then -I/usr/local/cuda/include is not added to CPPFLAGS for the "checking cuda.h usability / presence" tests. Hence, they fail.

I don't quite see how that can happen, but I don't have cuda-enabled machines to test this.

Here's Siegmar's config.log (since you can't currently get it from mail-archive

@sjeaugey Can you have a look?

jsquyres commented 7 years ago

As noted in https://www.mail-archive.com/users@lists.open-mpi.org/msg30642.html, @siegmargross stated that when adding the relevant -I and -L flags to CPPFLAGS and LDFLAGS, respectively, cuda support was added properly.

Does --with-cuda not automatically add the Right flags to the relevant CPPFLAGS / LDFLAGS? That would be a little weird, and not what configure does for other --with-FOO options.

sjeaugey commented 7 years ago

Without looking in depth, are you sure this part of the log is not the hwloc configure, where CUDA is indeed not detected (without affecting the CUDA support) ?

sjeaugey commented 7 years ago

I confirm this is just hwloc (again). CUDA support is enabled.

1635:configure:12021: checking if --with-cuda is set
1636:configure:12075: result: found (/usr/local/cuda/include/cuda.h)
1651:configure:12191: checking if have cuda support
1652:configure:12194: result: yes (-I/usr/local/cuda/include)
...
81025:configure:77920: checking cuda.h usability
81027:conftest.c:670:18: fatal error: cuda.h: No such file or directory
81704:configure:77920: checking cuda.h presence
81706:conftest.c:637:18: fatal error: cuda.h: No such file or directory
82350:configure:77920: checking for cuda.h
82352:configure:78003: checking cuda_runtime_api.h usability
82354:conftest.c:670:30: fatal error: cuda_runtime_api.h: No such file or directory
83031:configure:78003: checking cuda_runtime_api.h presence
83033:conftest.c:637:30: fatal error: cuda_runtime_api.h: No such file or directory
jsquyres commented 7 years ago

Oops. Sorry for the noise.

To prevent having this conversation yet again someday in the future, perhaps we should make hwloc's embedded configure also understand where cuda support lives...? I'm guessing it would not be hard at all. And could probably be added to 2.0.x and 2.1.x.

That being said, there's at least some possibility of us no longer embedding hwloc (perhaps as early as the 3.0.0 timeframe).

sjeaugey commented 7 years ago

I'm all for it ; it could be useful for the CUDA support too in the future (e.g. when picking an IB card, some could be closer to the GPU and get a higher score).