open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.16k stars 860 forks source link

Passing CFLAGS down to 3rd-party/ broken due to *_CFLAGS_BEFORE_PICKY #12895

Open dalcinl opened 1 week ago

dalcinl commented 1 week ago

I'm trying to build relocatable installations able to install and run in Python environments. As an additional feature, I'm trying to make the builds reproducible, which eventually requires removing hardwired paths to the source or build directory. This can be (partially) accomplished with compiler flags to handle things like __FILE__. But then I discovered that not all sources in 3rd-party/ are compiled with the flags I'm seeing via CFLAGS.

I guess the best way to notice the problem is doing a out-of-source configure with internal libraries. For this test I'm using the sources from the 5.0.5 tarball.

mkdir BUILD && cd BUILD
../configure --with-pmix=internal --with-prrte=internal

and after configure is done, within the same BUILD directory, run

grep CFLAGS_BEFORE_PICKY -r 3rd-party/
3rd-party/openpmix/src/util/keyval/Makefile:CFLAGS = $(PMIX_CFLAGS_BEFORE_PICKY)
3rd-party/openpmix/test/python/Makefile:CFLAGS = $(PMIX_CFLAGS_BEFORE_PICKY)
3rd-party/openpmix/test/sshot/Makefile:CFLAGS = $(PMIX_CFLAGS_BEFORE_PICKY)
3rd-party/openpmix/bindings/python/Makefile:CFLAGS = $(PMIX_CFLAGS_BEFORE_PICKY)
3rd-party/prrte/src/mca/rmaps/rank_file/Makefile:CFLAGS = $(PRTE_CFLAGS_BEFORE_PICKY)
3rd-party/prrte/src/util/hostfile/Makefile:CFLAGS = $(PMIX_CFLAGS_BEFORE_PICKY)

BTW, note that under prrte, both PRTE_CFLAGS_BEFORE_PICKY and PMIX_CFLAGS_BEFORE_PICKY are used. I suspect the the PMIX_CFLAGS_BEFORE_PICKY one is wrong, the other should be used.

As a workaround, for the v5.0.x tarball, I'm "hotfixing" the unpacked sources the following way:

        makefiles=(
            3rd-party/openpmix/src/util/keyval/Makefile.in
            3rd-party/prrte/src/mca/rmaps/rank_file/Makefile.in
            3rd-party/prrte/src/util/hostfile/Makefile.in
        )
        variables=(
            PRTE_CFLAGS_BEFORE_PICKY
            PMIX_CFLAGS_BEFORE_PICKY
        )
        for makefile in "${makefiles[@]}"; do
            for variable in "${variables[@]}"; do
                echo "$variable = @CFLAGS@" >> "$makefile"
            done
        done

After this patching, everything works as expected. This is of course not the proper fix. My only point is to prove that the generated Makefile files are missing the assignments (PMIX|PRTE)_CFLAGS_BEFORE_PICKY = ....

~I'm not an autotools expert. Maybe the various Makefile.am with lines like:~

CFLAGS = $(PRTE_CFLAGS_BEFORE_PICKY)

~are missing the following line~

PRTE_CFLAGS_BEFORE_PICKY = @PRTE_CFLAGS_BEFORE_PICKY@

~such that configure can do its magic?~ NO, this does not seem to make it :disappointed: .

cc @rhc54

rhc54 commented 1 week ago

I did some tests and the flags appear to be working as intended. Not clear that I still need to use them in all those places,but there are a couple that definitely need it. The PRRTE one was clearly a typo, so I fixed that one. However, any CFLAGS you provide should flow through - we just don't include all the picky compiler flags we add ourselves.

dalcinl commented 1 week ago

I'm using the release tarball 5.0.5. I have a reproducer to make my point and prove that CFLAGS are not being passed down.

#!/bin/bash
PREFIX=/tmp/install
BUILDDIR=/tmp/my_private_directory
mkdir -p $BUILDDIR && cd $BUILDDIR

curl -O https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.5.tar.gz
tar xf openmpi-5.0.5.tar.gz
cd openmpi-5.0.5

export CFLAGS=-ffile-prefix-map=$BUILDDIR=OPENMPI

./configure \
    --prefix=$PREFIX \
    --with-pmix=internal \
    --with-prrte=internal \
    --with-libevent=internal \
    --with-hwloc=internal \
    --without-ofi \
    --without-ucx \
    --without-psm2 \
    --without-cuda \
    --without-rocm \
    --disable-dlopen \
    --disable-oshmem \
    --disable-static \
    --disable-opencl \
    --disable-libxml2 \
    --disable-libompitrace \
    --enable-mpi-fortran=mpifh \
    --disable-dependency-tracking

cflags="s|\s*${CFLAGS}\s*| |g"
builddir="s|$BUILDDIR|OPENMPI|g"
makefiles+=(ompi/tools/ompi_info/Makefile)
makefiles+=(oshmem/tools/oshmem_info/Makefile)
makefiles+=(3rd-party/openpmix/src/tools/*/Makefile)
makefiles+=(3rd-party/prrte/src/tools/*/Makefile)
for filename in ${makefiles[@]}; do
    sed -i.orig "/-D.*_BUILD_CFLAGS=/$cflags" $filename
    sed -i.orig "/-D.*_BUILD_CPPFLAGS=/$builddir" $filename
    sed -i.orig "/-D.*_BUILD_LIBS=/$builddir" $filename
done
headers+=(opal/include/opal_config.h)
headers+=(3rd-party/openpmix/src/include/pmix_config.h)
headers+=(3rd-party/prrte/src/include/prte_config.h)
for filename in ${headers[@]}; do
    sed -i.orig "$cflags" $filename
    sed -i.orig "$builddir" $filename
done

make -j $(nproc)

grep "$PWD" $(find . -name '*.o')

This script is setting CFLAGS=-ffile-prefix-map=$BUILDDIR=OPENMPI such that the compiler will expand things like FILE replacing the build directory for the "OPENMPI" string. After configuring, I'm doing some additional editing of the makefiles and headers to remove references to $BUILDDIR that are being passed down via preprocessor macros. At the end of the script I run grep to check that $BUILDDIR has not leaked to object files. Therefore there should be no output after the make -j $(nproc) line. However, I am getting some output:

grep: ./3rd-party/prrte/src/mca/rmaps/rank_file/.libs/rmaps_rank_file.o: binary file matches
grep: ./3rd-party/prrte/src/util/hostfile/.libs/hostfile.o: binary file matches

and that simply means that these object files were compiled without using the CFLAGS I passed.

@rhc54 Maybe the whole problem is caused by the typo you fixed? Could you please point me to the commit with the fix?

dalcinl commented 1 week ago

@rhc54 Maybe the whole problem is caused by the typo you fixed?

No, that's not enough.

~After a fresh restart, I can confirm that 3rd-party/openpmix is all OK.~ No, it is not. However, 3rd-party/prrte still generates two object files without the right CFLAGS. I cannot figure out what's different in openmpix vs. prrte, there should be some place where the PRTE_CFLAGS_BEFORE_PICKY is not being set or passed down.

dalcinl commented 1 week ago

After running the script above, I performed the following two tests

cd 3rd-party/openpmix/src/util
touch pmix_argv.c && make -n V=1
/bin/sh ../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../src/include -I../../include   -iquote/tmp/my_private_directory/openmpi-5.0.5/3rd-party/openpmix -iquote/tmp/my_private_directory/openmpi-5.0.5/3rd-party/openpmix/src -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/openpmix/include -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/libevent-2.1.12-stable-ompi -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/libevent-2.1.12-stable-ompi/include -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/hwloc-2.7.1/include -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/hwloc-2.7.1/include   -O3 -DNDEBUG  -ffile-prefix-map=/tmp/my_private_directory=OPENMPI -finline-functions -mcx16  -c -o pmix_argv.lo pmix_argv.c
/bin/sh ../../libtool  --tag=CC   --mode=link gcc  -O3 -DNDEBUG  -ffile-prefix-map=/tmp/my_private_directory=OPENMPI -finline-functions -mcx16    -o libpmix_util.la   pmix_alfg.lo pmix_argv.lo pmix_cmd_line.lo pmix_error.lo pmix_printf.lo pmix_output.lo pmix_environ.lo pmix_fd.lo pmix_timings.lo pmix_os_dirpath.lo pmix_os_path.lo pmix_basename.lo pmix_keyval_parse.lo pmix_show_help.lo pmix_path.lo pmix_getid.lo pmix_shmem.lo pmix_vmem.lo pmix_hash.lo pmix_name_fns.lo pmix_net.lo pmix_if.lo pmix_parse_options.lo pmix_context_fns.lo pmix_pty.lo pmix_few.lo pmix_string_copy.lo pmix_getcwd.lo keyval/libpmixutilkeyval.la -lm   /tmp/my_private_directory/openmpi-5.0.5/3rd-party/libevent-2.1.12-stable-ompi/libevent_core.la /tmp/my_private_directory/openmpi-5.0.5/3rd-party/libevent-2.1.12-stable-ompi/libevent_pthreads.la /tmp/my_private_directory/openmpi-5.0.5/3rd-party/hwloc-2.7.1/hwloc/libhwloc.la

Note the presence of -ffile-prefix-map=/tmp/my_private_directory=OPENMPI

Next, try the following:

$ touch keyval_lex.c && make -n V=1
/bin/sh ../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../../src/include -I../../../include   -iquote/tmp/my_private_directory/openmpi-5.0.5/3rd-party/openpmix -iquote/tmp/my_private_directory/openmpi-5.0.5/3rd-party/openpmix/src -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/openpmix/include -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/libevent-2.1.12-stable-ompi -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/libevent-2.1.12-stable-ompi/include -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/hwloc-2.7.1/include -I/tmp/my_private_directory/openmpi-5.0.5/3rd-party/hwloc-2.7.1/include    -c -o keyval_lex.lo keyval_lex.c
/bin/sh ../../../libtool  --tag=CC   --mode=link gcc     -o libpmixutilkeyval.la  keyval_lex.lo  -lm   /tmp/my_private_directory/openmpi-5.0.5/3rd-party/libevent-2.1.12-stable-ompi/libevent_core.la /tmp/my_private_directory/openmpi-5.0.5/3rd-party/libevent-2.1.12-stable-ompi/libevent_pthreads.la /tmp/my_private_directory/openmpi-5.0.5/3rd-party/hwloc-2.7.1/hwloc/libhwloc.la

Note the absence of -ffile-prefix-map=/tmp/my_private_directory=OPENMPI.

Therefore, my original complaint stands, at least for the code in the latest release tarball. CFLAGS passed to the top-level configure are not being passed down to every source in 3rd-party. I believe something is broken in the way these *_CFLAGS_BEFORE_PICKY variables are handled in autotools. Unfortunately, I'm not savvy enough with autotools to figure out the root of the problem. Maybe a missing AC_SUBST and use '@PMIX|PRTE_CFLAGS_BEFORE_PICKY@' in Makefile.am ?

rhc54 commented 1 week ago

I think the first thing to check is whether or not the individual projects (i.e., standing on their own) are handling things correctly so folks can see if it is the OMPI-project integration that is the problem here. OMPI does, after all, process its own configure (including handling "picky" compiler flags) prior to passing it all down to us.

dalcinl commented 1 week ago

@rhc54 I think this is a problem in the individual projects, and it is simply a missing AC_SUBST in each. See the following two patches. I'm not sure whether the place to inject the AC_SUBST macro is the right one, but after that, all generated makefiles have a definition for _CFLAGS_BEFORE_PICKY, and then the lines `CFLAGS = $(_CFLAGS_BEFORE_PICKY)work as expected (rather than ending up with an emptyCFLAGS` variable).


EDIT: the ompi repo is also missing an `AC_SUBST` for `OPAL_CFLAGS_BEFORE_PICKY`. However, the `OPAL_CFLAGS_BEFORE_PICKY` is never referenced in makefiles, so cannot currently trigger the missing CFLAGS issue. Until someone inadvertently add `CFLAGS = $(OPAL_CFLAGS_BEFORE_PICKY)`  injecting the bug...
```diff
diff --git a/config/opal_setup_cc.m4 b/config/opal_setup_cc.m4
index 70b8c6f133..0ca32d8688 100644
--- a/config/opal_setup_cc.m4
+++ b/config/opal_setup_cc.m4
@@ -412,6 +412,7 @@ AC_DEFUN([OPAL_SETUP_CC],[

     OPAL_ENSURE_CONTAINS_OPTFLAGS("$OPAL_CFLAGS_BEFORE_PICKY")
     OPAL_CFLAGS_BEFORE_PICKY="$co_result"
+    AC_SUBST([OPAL_CFLAGS_BEFORE_PICKY])

     AC_MSG_CHECKING([for C optimization flags])
     OPAL_ENSURE_CONTAINS_OPTFLAGS(["$CFLAGS"])
rhc54 commented 1 week ago

I'll take a look, but am somewhat suspicious - as I noted earlier, I verified that the Makefile is doing the right thing (in terms of substituting the correct "before picky" cflags) as it stands today. Not sure when I'll get to it - need to finish the current punch list first.

dalcinl commented 1 week ago

@rhc54 If it is quick to answer, how are you doing your testing? Without AC_SUBST, how the PMIX_CFLAGS_BEFORE_PICKY variable would get defined in the makefiles? In any case, this is not urgent, I managed to workaround things for the time being, and things would get hopefully fixed for next release.

@wenduwan Could you please add this issue to the list of things to address before the next release?

rhc54 commented 1 week ago

I simply remove the "before picky" entry from the Makefile and verify that compilation fails with picky enabled. Put that line back and it all works again. So it clearly is being set. Note, however, that it only gets set if you either specifically ask for devel-check, or you are in a Git checkout and have enabled debug. Otherwise, it is a noop.

Note that wenduwan is no longer with the project. I somewhat doubt anything will be done for the next release as release candidates for PMIx and PRRTE are already out, there is a short time fuse, and this isn't a critical issue impacting users.

dalcinl commented 1 week ago

I simply remove the "before picky" entry from the Makefile and verify that compilation fails with picky enabled. Put that line back and it all works again.

That is of course expected. When you put the line back, it is equivalent to doing CFLAGS = # empty in the makefile, that's the reason it work. In other words, the root of the problem is that within the makefiles the *_CFLAGS_BEFORE_PICKY are empty, simply because configure never generated the assignment due to the missing AC_SUBST I'm proposing in the patches above.

this isn't a critical issue impacting users.

I'm personally in no hurry and under no pressure for this fix. However, in a more general consideration, missing compiler flags could potentially ease undisclosed exploitable security issues.

rhc54 commented 1 week ago

I remain suspicious because AC_SUBST doesn't work that way - it requires a very specific pattern that isn't present in the Makefile. The config code involved hasn't changed in many years, with no report of issues until this one. So I somehow suspect there is something else going on - as you say, you aren't an autoconf expert (and neither am I). It'll take more effort to really decipher what, if anything, is the correct thing to do.

flags could potentially ease undisclosed exploitable security issues.

Ah yes - the usual boogey-man warning 😄 Appropriate for Halloween over here!

Seriously, I'll get to it when time permits. A low priority compared to other things.