open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.16k stars 859 forks source link

Issues with Flex despite being installed on multiple systems. #9169

Closed Yiltan closed 3 years ago

Yiltan commented 3 years ago

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Master branch. Using the commit 49c8b3323d24d61fd30ba4f1320681b3cda89de3 from a few days ago.

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

I am installing it via git clone. This is my installation script:

  if [ ! -d ompi ]; then
    # This is the SHA for the master branch on 19/07/20
    SHA=49c8b3323d24d61fd30ba4f1320681b3cda89de3
    git clone https://github.com/open-mpi/ompi.git ompi
    cd ompi
    # TODO: can this be simplified
    git reset --hard $SHA
    git submodule init
    git submodule sync
    git submodule update --init --recursive --remote
    cd ..
  fi

  cd ompi
  perl ./autogen.pl --no-oshmem && \
  ./configure --prefix=$BUILD_DIR \
              --disable-io-ompio \
              --disable-oshmem \
              --with-pmix=internal \
              --with-cuda=$CUDA_HOME \
              --with-ucx=$BUILD_DIR && \
  make -j 32 all && \
  make install

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

+05852a7e12d256b7f6b4356b6a61cd362cf2fe68 3rd-party/openpmix (v1.1.3-3049-g05852a7e)
+05d1fb18a957d62d8d853dadb8eadc30f3242120 3rd-party/prrte (dev-31299-g05d1fb18a9)

Please describe the system on which you are running

I am also getting the same error on an A100 DGX system.


Details of the problem

I have Flex 2.6.4 installed but I am getting the following error when building from master.

checking for flex... flex
checking for lex output file root... lex.yy
checking for lex library... none needed
checking for library containing yywrap... no
configure: WARNING: yywrap not found; giving up on flex
checking if want dlopen support... yes
checking if embedded mode is enabled... no
configure: WARNING: PMIx requires Flex to build from non-tarball sources,
configure: WARNING: but Flex was not found. Please install Flex into
configure: WARNING: your path and try again
configure: error: Cannot continue
configure: ===== done with 3rd-party/openpmix configure =====
configure: error: Could not find viable pmix build.

I have the config.log for both OMPI and OpenPMIX:

rhc54 commented 3 years ago

This has been fixed - update to today's head of master branch.

Yiltan commented 3 years ago

I updated to today's head and still had the issue.

I used git bisect to find where the issue came. The bug occurred at #8580. Anything after does not seem to work. I am on an IBM system so I wonder if it is related to your comment there.

Would you have any thoughts?

rhc54 commented 3 years ago

Afraid you'll have to get help from the IBM folks here - I have no ideas. The current version of PMIx in OMPI isn't looking for yywrap, so this looks stale to me or else something else is going on.

Yiltan commented 3 years ago

No to worry, thank your for your help

jsquyres commented 3 years ago

I think you have a broken flex installation. This is in the pmix config.log:

configure:7421: checking for library containing yywrap
configure:7451: gcc -o conftest   -I/scratch/q/queenspp/temuciny/summer_school/ompi/3rd-party/libevent-2.1.12-stable -I/scratch/q/queenspp/temuciny/summer_school/ompi/3rd-party/libevent-2.1.12-stable/include  conftest.c  >&5
/tmp/ccL8z1Qv.o: In function `main':
conftest.c:(.text+0x1c): undefined reference to `yywrap'
collect2: error: ld returned 1 exit status
configure:7451: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "pmix"
| #define PACKAGE_TARNAME "pmix"
| #define PACKAGE_VERSION "5.0.0a1"
| #define PACKAGE_STRING "pmix 5.0.0a1"
| #define PACKAGE_BUGREPORT "https://github.com/openpmix/openpmix/issues"
| #define PACKAGE_URL ""
| #define PMIX_CONFIGURE_CLI " \'--disable-option-checking\' \'--prefix=/scratch/q/queenspp/temuciny/summer_school/build\' \'--without-tests-examples\' \'--disable-pmix-binaries\' \'--disable-pmix-backward-compatibility\' \'--disable-visibility\' \'--with-libevent=cobuild\' \'--with-prte-extra-lib= /scratch/q/queenspp/temuciny/summer_school/ompi/3rd-party/libevent-2.1.12-stable/libevent_core.la /scratch/q/queenspp/temuciny/summer_school/ompi/3rd-party/libevent-2.1.12-stable/libevent_pthreads.la\' \'--disable-io-ompio\' \'--disable-oshmem\' \'--with-cuda=/scinet/mist/rhel8/software/2021a/opt/base/cuda/11.0.3\' \'--with-ucx=/scratch/q/queenspp/temuciny/summer_school/build\' \'CC=gcc\' \'CPPFLAGS= -I/scratch/q/queenspp/temuciny/summer_school/ompi/3rd-party/libevent-2.1.12-stable -I/scratch/q/queenspp/temuciny/summer_school/ompi/3rd-party/libevent-2.1.12-stable/include\' \'CXX=g++\' \'FC=gfortran\' \'--cache-file=/dev/null\' \'--srcdir=.\'"
| #define PMIX_CONFIGURE_USER "temuciny"
| #define PMIX_CONFIGURE_HOST "mist-login01.scinet.local"
| #define PMIX_CONFIGURE_DATE "Wed Jul 21 14:19:19 UTC 2021"
| #define HAVE_STDIO_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_UNISTD_H 1
| #define HAVE_WCHAR_H 1
| #define STDC_HEADERS 1
| #define _ALL_SOURCE 1
| #define _DARWIN_C_SOURCE 1
| #define _GNU_SOURCE 1
| #define _HPUX_ALT_XOPEN_SOCKET_API 1
| #define _NETBSD_SOURCE 1
| #define _OPENBSD_SOURCE 1
| #define _POSIX_PTHREAD_SEMANTICS 1
| #define __STDC_WANT_IEC_60559_ATTRIBS_EXT__ 1
| #define __STDC_WANT_IEC_60559_BFP_EXT__ 1
| #define __STDC_WANT_IEC_60559_DFP_EXT__ 1
| #define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1
| #define __STDC_WANT_IEC_60559_TYPES_EXT__ 1
| #define __STDC_WANT_LIB_EXT2__ 1
| #define __STDC_WANT_MATH_SPEC_FUNCS__ 1
| #define _TANDEM_SOURCE 1
| #define __EXTENSIONS__ 1
| /* end confdefs.h.  */
| 
| /* Override any GCC internal prototype to avoid an error.
|    Use char because int might match the return type of a GCC
|    builtin and then its argument prototype would still apply.  */
| char yywrap ();
| int
| main (void)
| {
| return yywrap ();
|   ;
|   return 0;
| }
configure:7451: gcc -o conftest   -I/scratch/q/queenspp/temuciny/summer_school/ompi/3rd-party/libevent-2.1.12-stable -I/scratch/q/queenspp/temuciny/summer_school/ompi/3rd-party/libevent-2.1.12-stable/include  conftest.c -lfl   >&5
/scinet/mist/rhel8/software/2021a/opt/base/flex/2.6.4/lib/libfl.so: undefined reference to `yylex'
collect2: error: ld returned 1 exit status
albandil commented 2 years ago

I have the same problem (undefined reference to yylex) when I checkout the current v5.0.0rc7 tag from the repository. It turns out that this is indeed (?) an operating system issue. I use openSUSE Tumbleweed, which ships just the shared version of the libfl library and no static library. The version of Flex is 2.6.4.

I know close to nothing about Flex, but it seems to me that the library is supposed to be linked to some code generated by Flex, which then possibly also defines yylex. But the configure test in Open MPI (or PMIX) links to the library without defining yylex. When I insert

int yylex () {}

into the above-mentioned configure test, the configuration proceeds correctly without failure. When the system provides the static libfl library, yylex is likely not needed, because just a subset of the static library is used.

One might dismiss this as a distribution-specific quirk, or a "broken flex installation". But some other distributions seem to be affected as well, so modifying the configure test in Open MPI / PMIX might be useful anyway.