pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
535 stars 280 forks source link

"datatype/large_type_sendrec" test fails on bgq #2076

Closed mpichbot closed 7 years ago

mpichbot commented 7 years ago

Originally by blocksom on 2014-05-01 09:34:44 -0500


The datatype/large_type_sendrec test fails on bgq with the following error:

runjob -n 2 --ranks-per-node=1 --block R00-M0-N10-64 \
       --timeout 240 --envs BG_MAPCOMMONHEAP=1 :     \
       ./large_type_sendrec
errors = 4294967296 

This fails with 1 and 8 ranks-per-node and with the following configuration options:

  1. "global" lock
../../mpich/configure --host=powerpc64-bgq-linux     \
                      --with-device=pamid            \
                      --with-file-system=gpfs:BGQ 
  1. "per object" lock
../../mpich/configure --host=powerpc64-bgq-linux     \
                      --with-device=pamid            \
                      --with-file-system=gpfs:BGQ    \
                      --enable-thread-cs=per-object  \
                      --with-atomic-primitives       \
                      --enable-handle-allocation=tls \
                      --enable-refcount=lock-free    \
                      --disable-predefined-refcount
mpichbot commented 7 years ago

Originally by jnysal on 2014-05-21 08:47:29 -0500


There are issues with PAMI sending data >= 4GB. I also found a few places in pamid where we are using "unsigned int" instead of size_t for the buffer size. So I'll change the milestone to 3.1.2 for now as we need a fix in PAMI too.

mpichbot commented 7 years ago

Originally by jnysal on 2014-05-22 08:31:09 -0500


Fixes for pamid device are ready. This allows the test to pass on BGQ. Hence changing the target back to 3.1.1. For power architecture, additional fixes are needed in the PAMI library.

mpichbot commented 7 years ago

Originally by Michael Blocksome blocksom@us.ibm.com on 2014-05-22 09:31:43 -0500


In 98b5e585a61a8eccbd0224b64c66c505fa5ddf0e: pamid: Allow message sizes greater than 4GB

Allow message sizes >= 4GB

Fixes #2076

Signed-off-by: Michael Blocksome blocksom@us.ibm.com