openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.16k stars 428 forks source link

ucx 1.6.1 fails to configure/build on RHEL 7.6 #4152

Open ca-taylor opened 5 years ago

ca-taylor commented 5 years ago

I'm trying to build UCX 1.6.1 on RHEL 7.6. However, there is an explicit requirement for libibcm.so on the configure script. Trying to install libibcm on 7.6 yields,

Package libibcm-15-7.el7_5.x86_64 is obsoleted by rdma-core-17.2-3.el7.x86_64 which is already installed

I patched the configure script to omit the libibcm check, but end up with,

Processing files: ucx-ib-cm-1.6.1-1.el76.x86_64 error: File not found by glob: /home/chasman/rpmbuild/BUILDROOT/ucx-1.6.1-1.el76.x86_64/usr/lib64/ucx/libuct_ib_cm.so.*

RPM build errors: File not found by glob: /home/chasman/rpmbuild/BUILDROOT/ucx-1.6.1-1.el76.x86_64/usr/lib64/ucx/libuct_ib_cm.so.*

I could modify the .spec file and get rid of that as well but I don't really want to end up with an incomplete build.

What is the correct way to build ucx-1.6.1 RPMs on RHEL 7.6

ca-taylor commented 5 years ago

FWIW, I googled this issue a good but and came up dry.

ca-taylor commented 5 years ago

Also, it builds without issue on RHEL 7.5

yosefe commented 5 years ago

@ca-taylor pls upload full command line and output of rpmbuild it can happen if you configured UCX from RHEL7.5 but trying to run ./buildrpm.sh on RHEL7.6 from same directory

ca-taylor commented 5 years ago

No, I do not run "./buildrpm.sh". I use the provided .spec file and build the RPMs with,

`export VERSION=1.5.2 export VERSION=1.4.0 export VERSION=1.6.1 export DIST="el75" export DIST="el76"

CFG_OPTS="" CFG_OPTS="$CFG_OPTS --with-avx " CFG_OPTS="$CFG_OPTS --with-verbs " CFG_OPTS="$CFG_OPTS --with-rc " CFG_OPTS="$CFG_OPTS --with-ud " CFG_OPTS="$CFG_OPTS --with-dc " CFG_OPTS="$CFG_OPTS --with-rdmacm " CFG_OPTS="$CFG_OPTS --with-mlx5-dv" CFG_OPTS="$CFG_OPTS --with-ib-hw-tm" CFG_OPTS="$CFG_OPTS --with-dm" if [ $DIST == "el75" ]; then CFG_OPTS="$CFG_OPTS --with-cm" else CFG_OPTS="$CFG_OPTS --without-cm" fi

rpmbuild --ba \ --define "_version ${VERSION}" \ --define "dist $DIST" \ --define '_enable_debug --with-debug' \ --define '_prefix /usr' \ --define '_defaultdocdir %{_prefix}/doc' \ --define '_mandir %{_prefix}/share/man' \ --define 'mflags -j 1' \ --define "configure_options $CFG_OPTS " \ ucx-${VERSION}.spec `

Using --with-cm on an EL7.6 host fails first because the configure script wants libibcm to exist and then (assuming you modify the configure script to get around that), because the build wants libibcm to exist.

ca-taylor commented 5 years ago

In other words, it does not appear that the current ucx-1.6.1 spec file and configure script work on EL7.6

yosefe commented 5 years ago

@ca-taylor can you pls upload full command line and output of this script when it fails?

ca-taylor commented 5 years ago

@ca-taylor can you pls upload full command line and output of this script when it fails?

Sure, but let me set it up "clean" again so you don't have any of my changes to worry about.

ca-taylor commented 5 years ago

Here you go...

[chasman@c1a-s1 SPECS]$ uname -a Linux c1a-s1.ufhpc 3.10.0-957.27.2.el7.x86_64 #1 SMP Tue Jul 9 16:53:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [chasman@c1a-s1 SPECS]$ cat /etc/redhat-release

openUcx-4152.tar.gz

To summarize, `#/bin/bash #

export VERSION=1.4.0

export VERSION=1.5.2

export VERSION=1.6.1

export DIST="el75"

export DIST="el76"

CFG_OPTS="" CFG_OPTS="$CFG_OPTS --with-avx " CFG_OPTS="$CFG_OPTS --with-verbs " CFG_OPTS="$CFG_OPTS --with-rc " CFG_OPTS="$CFG_OPTS --with-ud " CFG_OPTS="$CFG_OPTS --with-dc " CFG_OPTS="$CFG_OPTS --with-rdmacm " CFG_OPTS="$CFG_OPTS --with-mlx5-dv" CFG_OPTS="$CFG_OPTS --with-ib-hw-tm" CFG_OPTS="$CFG_OPTS --with-dm"
CFG_OPTS="$CFG_OPTS --with-cm"

if [ $DIST == "el75" ]; then

CFG_OPTS="$CFG_OPTS --with-cm"

else

CFG_OPTS="$CFG_OPTS --without-cm"

fi

rpmbuild --ba \ --define "_version ${VERSION}" \ --define "dist $DIST" \ --define '_enable_debug --with-debug' \ --define '_prefix /usr' \ --define '_defaultdocdir %{_prefix}/doc' \ --define '_mandir %{_prefix}/share/man' \ --define 'mflags -j 1' \ --define "configure_options $CFG_OPTS " \ ucx-${VERSION}.spec `

Here is the resulting "configure" command used by rpmbuild, ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-optimizations --disable-logging --disable-debug --disable-assertions --disable-params-check --enable-cma --without-cuda --without-gdrcopy --with-verbs --without-cm --without-knem --with-rdmacm --without-rocm --without-xpmem --without-ugni --with-avx --with-verbs --with-rc --with-ud --with-dc --with-rdmacm --with-mlx5-dv --with-ib-hw-tm --with-dm --with-cm Here is where it stops, `checking for ib_cm_send_req in -libcm... no configure: error: CM requested but lib ibcm not found error: Bad exit status from /var/tmp/rpm-tmp.mLE03s (%build)

RPM build errors: Bad exit status from /var/tmp/rpm-tmp.mLE03s (%build) `

ca-taylor commented 5 years ago

In the meantime, I'll try the "buildrpm.sh" script and see what happens.

ca-taylor commented 5 years ago

Hmmm. the buildrpm.sh on the contrib directory doesn't appear to work. Different problem, I guess.

hiroyuki-sato commented 5 years ago

Hello, @ca-taylor

Could you try yum install librdmacm-devel

Additionally, tryyum groupinstall 'Infiniband Support' (Maybe you don't need to execute this command)

After that, execute buildrpm.sh

Best regards.

ca-taylor commented 5 years ago

Could you try yum install librdmacm-devel

Thank you for the suggestion. I tried that already. On EL7.6 you get,

Package librdmacm-devel-1.1.0-2.el7.x86_64 is obsoleted by rdma-core-devel-17.2-3.el7.x86_64 which is already installed Nothing to do Uploading Enabled Repositories Report Loaded plugins: langpacks, product-id, subscription-manager

ca-taylor commented 5 years ago

The routine that the configure script checks for is "ib_cm_send_req" which I can't find in any library under RHEL 7.6 and the configure script specifically references libibcm which does not exist under 7.6 so I don't see how you would ever get an ucx-ib-cm-1.6.1 on EL7.6.

Additionally, there is this in the distributed .spec file.

%if 0%{?fedora} >= 30 || 0%{?rhel} >= 7 %bcond_with ib_cm %else %bcond_without ib_cm My RPM foo is pretty weak but I interpret that to mean that the default for EL 7 and above is to "--with-ib-cm".

hiroyuki-sato commented 5 years ago

I reproduced it with the following command.

./contrib/configure-devel --disable-numa --with-cm
...
configure: error: CM requested but lib ibcm not found

It may related https://patchwork.kernel.org/patch/10132535/ It seems that libibcm already removed on RHEL7.6(CentOS7.6)

@ca-taylor BTW, Do you want to use --with-cm instead of --with-ib?

I can succeed to build RPM on CentOS7.6 with the following command.

./contrib/buildrpm.sh -t
./contrib/buildrpm.sh -b

I used ee3d37f9c8debe42aa86b6064895d5dd801298a5.

yosefe commented 5 years ago

@ca-taylor there should not be a ib-cm for RHEL 7.6, it was obsoleted by rdma-core your RPM build script is invalid because it sets CFG_OPTS="$CFG_OPTS --with-cm" so it requires the presence of ib-cm instead of forcing with/without configure options, need to let configure auto-detect