open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.12k stars 858 forks source link

openmpi 2.1.0 fails to build on s390x #3443

Closed opoplawski closed 7 years ago

opoplawski commented 7 years ago

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

2.1.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Building Fedora openmpi package

Please describe the system on which you are running


Details of the problem

libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../opal/mca/hwloc/hwloc1112/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc1112/hwloc/include/hwloc/autogen -I../../../../ompi/mpiext/cuda/c -I../../../.. -I../../../../orte/include -I/builddir/build/BUILD/openmpi-2.1.0/opal/mca/event/libevent2022/libevent -I/builddir/build/BUILD/openmpi-2.1.0/opal/mca/event/libevent2022/libevent/include -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -march=zEC12 -mtune=z13 -finline-functions -fno-strict-aliasing -pthread -MT mca_btl_sm_la-btl_sm_component.lo -MD -MP -MF .deps/mca_btl_sm_la-btl_sm_component.Tpo -c btl_sm_component.c  -fPIC -DPIC -o .libs/mca_btl_sm_la-btl_sm_component.o
btl_sm_component.c: In function 'create_rndv_file':
btl_sm_component.c:631:5: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result [-Wunused-result]
     asprintf(&tmpfname, "%s.tmp", fname);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from btl_sm.c:45:0:
../../../../opal/include/opal/sys/cma.h:89:2: error: #error "Unsupported architecture for process_vm_readv and process_vm_writev syscalls"
 #error "Unsupported architecture for process_vm_readv and process_vm_writev syscalls"
  ^~~~~
../../../../opal/include/opal/sys/cma.h: In function 'process_vm_readv':
../../../../opal/include/opal/sys/cma.h:101:18: error: '__NR_process_vm_readv' undeclared (first use in this function); did you mean 'process_vm_readv'?
   return syscall(__NR_process_vm_readv, pid, lvec, liovcnt, rvec, riovcnt, flags);
                  ^~~~~~~~~~~~~~~~~~~~~
                  process_vm_readv
../../../../opal/include/opal/sys/cma.h:101:18: note: each undeclared identifier is reported only once for each function it appears in
../../../../opal/include/opal/sys/cma.h: In function 'process_vm_writev':
../../../../opal/include/opal/sys/cma.h:112:18: error: '__NR_process_vm_writev' undeclared (first use in this function); did you mean 'process_vm_writev'?
   return syscall(__NR_process_vm_writev, pid, lvec, liovcnt, rvec, riovcnt, flags);
                  ^~~~~~~~~~~~~~~~~~~~~~
                  process_vm_writev
/usr/include/bits/uio.h: In function 'process_vm_readv':
../../../../opal/include/opal/sys/cma.h:102:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^

2.0.2 builds fine. With 2.0.2 I see:

checking if user requested CMA build... no

With 2.1.0 I see:

Transports
-----------------------
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
Intel SCIF: no
Intel TrueScale (PSM): no
Mellanox MXM: no
Open UCX: no
OpenFabrics Libfabric: no
OpenFabrics Verbs: no
Portals4: no
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes

I seem to be unable to disable CMA either with --without-cma or --with-cma=no.

amckinstry commented 7 years ago

Confirm that this is also the case with the Debian openmpi package. The configuration can be seen here: http://sources.debian.net/src/openmpi/2.1.0rc2-1/debian/rules/

(Ignore the 2.1.0rc2 version tag; it is 2.1.0)

jsquyres commented 7 years ago

Does the same thing happen with 2.1.1rc1? https://www.open-mpi.org/software/ompi/v2.1/

When I use --without-cma with 2.1.1rc1, it seems to disable CMA properly for me.

jsquyres commented 7 years ago

Actually, I'd also like to know if OMPI 2.1.1rc1 properly disables CMA on the platform on which you're building. E.g., if there's platforms where CMA simply does not work, configure should auto-disable building CMA on those platforms.

We're likely just running into this now because I seem to recall that CMA was not built by default in the v2.0.x series, but we now try to build it by default in the v2.1.x series.

amckinstry commented 7 years ago

I hadn't noticed the 2.1.1rc1 release; thanks, i'll test it,

jsquyres commented 7 years ago

I was actually all set to release v2.1.1 yesterday -- this issue and #3442 gave me pause. So if you could let me know ASAP, that would be great. Even if we have to have you temporarily --without-cma to build v2.1.1 on problematic architectures, I'm comfortable adding a fix for auto-disabling CMA on unsupported platforms after v2.1.1.

opoplawski commented 7 years ago

I see the same compile problem with 2.1.1rc1, but --without-cma does appear to disable it now.

jsquyres commented 7 years ago

Ok, good news, at least, that --without-cma works around the issue. I'll mark this as a 2.1.2 issue for now.

Can you send a link to a full build log and/or a config.log file so that we can look at why it's not automatically disabling itself on platforms that don't support CMA?

opoplawski commented 7 years ago

Full build log - https://kojipkgs.fedoraproject.org//work/tasks/8839/19398839/build.log it will be around for a week or so. Printed all config.log files - perhaps overkill.

jsquyres commented 7 years ago

Closing since this is now merged on master, v2.x, and v3.x.