open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.12k stars 857 forks source link

Heterogeneous data movement appears broken #168

Closed ompiteam closed 7 years ago

ompiteam commented 9 years ago

Data transfer between machines of different endian-ness appears to be broken all the way back to the 1.6 release. Here is the error report from the user:

The problem occurs in openmpi-1.6.x, openmpi-1.7, and openmpi-1.9. Now I implemented a small program which only scatters the columns of an integer matrix so that it is easier to see what goes wrong. I configured for a heterogeneous environment. Adding "-hetero-nodes" and/or "-hetero-apps" on the command line doesn't change much as you can see at the end of this email. Everything works fine, if I use only little endian or only big endian machines. Is it possible to fix the problem or do you know in which file(s) I would have to look to find the problem or do you know debug switches which would provide more information to solve the problem?

I used the following command to configure the package on my "Solaris 10 Sparc" system (the commands for my other systems are similar).

../openmpi-1.9a1r27668/configure --prefix=/usr/local/openmpi-1.9_64_cc \
 --libdir=/usr/local/openmpi-1.9_64_cc/lib64 \
 --with-jdk-bindir=/usr/local/jdk1.7.0_07/bin/sparcv9 \
 --with-jdk-headers=/usr/local/jdk1.7.0_07/include \
 JAVA_HOME=/usr/local/jdk1.7.0_07 \
 LDFLAGS="-m64" \
 CC="cc" CXX="CC" FC="f95" \
 CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
 CPP="cpp" CXXCPP="cpp" \
 CPPFLAGS="" CXXCPPFLAGS="" \
 C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
 OBJC_INCLUDE_PATH="" OPENMPI_HOME="" \
 --enable-cxx-exceptions \
 --enable-mpi-java \
 --enable-heterogeneous \
 --enable-opal-multi-threads \
 --enable-mpi-thread-multiple \
 --with-threads=posix \
 --with-hwloc=internal \
 --without-verbs \
 --without-udapl \
 --with-wrapper-cflags=-m64 \
 --enable-debug \
 |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
tyr small_prog 501 ompi_info | grep -e Ident -e Hetero -e "Built on"
           Ident string: 1.9a1r27668
               Built on: Wed Dec 12 09:00:13 CET 2012
  Heterogeneous support: yes
tyr small_prog 502 
tyr small_prog 488 mpiexec -np 6 -host sunpc0,rs0 column_int

matrix:

0x12345678  0x12345678  0x12345678  0x12345678  0x12345678  0x12345678  
0x12345678  0x12345678  0x12345678  0x12345678  0x12345678  0x12345678  
0x12345678  0x12345678  0x12345678  0x12345678  0x12345678  0x12345678  
0x12345678  0x12345678  0x12345678  0x12345678  0x12345678  0x12345678  

Column of process 1:
0x12345678  0x12345678  0x12345678  0x12345678  

Column of process 2:
0x12345678  0x12345678  0x12345678  0x12345678  

Column of process 3:
0x56780000  0x12340000  0x5678ffff  0x1234ce71  

Column of process 4:
0x56780000  0x12340000  0x5678ffff  0x1234ce71  

Column of process 0:
0x12345678  0x12345678  0x12345678  0x12345678  

Column of process 5:
0x56780000  0x12340000  0x5678ffff  0x1234ce71  
tyr small_prog 489 

Additional detail available on the user's posting:

[http://www.open-mpi.org/community/lists/users/2012/12/20948.php]

ompiteam commented 9 years ago

Imported from trac issue 3430. Created by rhc on 2012-12-14T10:42:17, last modified: 2012-12-15T07:55:44

jsquyres commented 7 years ago

This appears to be a(n older) dup of #639, and is now moot due to #2802.