Closed andre-merzky closed 6 years ago
I do not think pbs-config
is mandatory in order build tm
support (e.g. PBS/torque)
Could you please compress and upload your config.log
?
Indeed, I am running configure ... --with-tm
, and pbs-config
is pulled due to that option. But I may be missing something here, of course: without that option, OpenMPI seem not to be able to learn about the available compute nodes. Or at least I saw errors in job placement on mpirun. When using that option (previously, with torque 4.x which had pbs-config
), that problem disappeared.
I attached the config.log, but will also ping back later to verify that leaving out --with-tm
results in that placement failure.
I will check the log later, thanks.
I reviewed config/orte_check_tm.m4
and unless I misunderstood that part
tm
support is tried by default--with-tm
simply tells configure
to abort if tm
support cannot be builtpbs-config
is tried, but there is a fallback path if it is not foundBottom line, if configure --with-tm
succeed, it means tm
support was built successfully.
I checked the log, tm.h
was not found, and though --with-tm
was specified, configure
did not abort (!)
Where is tm.h
on your system ?
If it is in /usr/include/torque/tm.h
then you have to configure --with-tm CPPFLAGS=-I/usr/include/torque
I will investigate on why configure
did not abort
my bad, configure
did abort as expected.
since there is no pbs-config
you have two options
tm.h
is in DIR/include
and libtorque.*
is in DIR/lib[64]
, then configure --with-tm=DIR
tm.h
is in a "non standard" directory, then use CPPFLAGS=...
as explained earlierI should put a hold on this ticket: configurations which resulted in a usable installation on Titan don't work anymore, presumably because of more system changes. Just in case any of you guys is currently working on Titan or any other Cray (Blue Waters is also on my TODO list), I would appreciate any advise. Otherwise I'll ping back once I figured out how to build again.
Thanks, Andre.
Well, I just downloaded and built Torque master (which is somewhere on the 6.1 path), and then built OMPI master by simply pointing --with-tm=DIR
where DIR is the install location I used for Torque. OMPI correctly picked up the Torque support and built all tm components.
I can't try executing it, of course, but things seem to build just fine. It is quite possible that someone has changed the format of the PBS_NODEFILE, though I'd be surprised as that would break a ton of code out there.
@rhc54 if torque
is built with the default RPM .spec
file, then the header is in /usr/include/torque
.
So unless pbs-config
can be used to set CPPFLAGS
, or we update our configury
in order to search into this non standard directory (just like we do for pmi
), then Open MPI is currently unavailable to find the header file and is hence unable build tm
support out of the box.
FWIW, I built the RPMS from the latest 6.1.2
tarball, and pbs-config
is part of the torque-devel
package with the /usr/include/torque/tm.h
header file, so maybe this package is not installed or there could be an other site specific issue.
@andre-merzky once upon a time, a related issue was reported (tm
support is built, but does not use the allocated nodes). We ended up concluding the torque install was busted on that site, and pbsdsh
was used to evidence this. So you might want to try the following script, that should print (at least) two different hostnames) first.
#!/bin/sh
#PBS -l nodes=2
pbsdsh hostname
@andre-merzky Ok, thanks. I'll close this issue -- let us know if we should re-open it.
Thanks for the feedback @jsquyres , @rhc54 ! Titan support did not yet install the torque devel package, but I meanwhile follow Ralph's advice and use a private torque installation. Compilation is smooth that way.
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
OpenMPI master branch
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
We installl from a git clone, on Titan, a Cray XK7. The system recently switched from torque 4.x to 6.x. The 4.x modules are gone now, and so is
pbs-config
. OMPI's configure script however relies on that being available.Please describe the system on which you are running
Details of the problem
Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.
As stated above: OMPI's configire script needs
pbs-config
, which is not installed by default anymore on Cray's after switch to Torque v6.x. The default MPI on Cray is MPICH, our costom OMPI installation is what complains.