pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
560 stars 279 forks source link

bug: Pack/Unpack_external with MPI_LONG and MPI_DOUBLE #967

Closed mpichbot closed 3 years ago

mpichbot commented 8 years ago

Originally by goodell on 2009-12-07 14:28:47 -0600


Originally reported on mpich2-dev@ by Lisandro Dalcín

Sorry if this is a known issue. Also sorry for pasting Python code,
but no time right now to write proper testcase in C.

From long time back, I'm having issues in mpi4py testsuite when using
Pack/Unpack_external. The following script should be enough to exhibit
all the issues I have on Linux 32 and 64.

This is the Python code I'm testing. Basically, for various MPI
datatypes, I do a pack/unpack of an array with 5 items and print them
at the end, expecting input/output arrays to be the same:

from mpi4py import MPI
import numpy

for typecode, datatype in [('i', MPI.INT),
                          ('l', MPI.LONG),
                          ('q', MPI.LONG_LONG),
                          ('f', MPI.FLOAT),
                          ('d', MPI.DOUBLE),
                          ]:
   # temp array for packing
   nbytes = datatype.Pack_external_size('external32', 5)
   tmpbuf = numpy.empty(nbytes, dtype='B') # unsigned char

   # pack input array
   iarray = numpy.arange(1,6, dtype=typecode)
   datatype.Pack_external('external32', iarray, tmpbuf, 0)

   # unpack output array
   oarray = numpy.zeros(5, dtype=typecode)
   datatype.Unpack_external('external32', tmpbuf, 0, oarray)

   print datatype.Get_name(), datatype.Get_size()
   print 'input: ', iarray
   print 'output:', oarray
   print 'buffer:', tmpbuf, len(tmpbuf)

Now I'll list my issues when I run the code above on Linux 32 and 64 bits:

1) Linux 32:

All but the MPI_DOUBLE iteration works. For the MPI_DOUBLE  case, the
pack/unpack seems to do the job, but memory is getting corrupted. A
run under valgrind shows this a few times (both invalid reads and
invalid writes):

==10399## Invalid read of size 410399##    at 0x48736AA: external32_float_convert (mpid_ext32_segment.h:167)10399==    by 0x487435D: MPID_Segment_contig_pack_external32_to_buf
(mpid_ext32_segment.c:208)
==10399##    by 0x48E13ED: MPID_Segment_manipulate (segment.c:528)10399==    by 0x487326E: MPID_Segment_pack_external32
(mpid_ext32_segment.c:302)
==10399##    by 0x48B3088: PMPI_Pack_external (pack_external.c:140)10399==    by 0x469B16F:
__pyx_pf_6mpi4py_3MPI_8Datatype_Pack_external (mpi4py.MPI.c:34203)
==10399##    by 0x5533077: PyCFunction_Call (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x558F072: PyEval_EvalFrameEx (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x5590E49: PyEval_EvalCodeEx (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x5590FB3: PyEval_EvalCode (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x55AC25B: ??? (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x55AC322: PyRun_FileExFlags (in /usr/lib/libpython2.6.so.1.0)10399##  Address 0x42e4b48 is 0 bytes after a block of size 40 alloc'd10399##    at 0x4005BDC: malloc (vg_replace_malloc.c:195)10399==    by 0x5222979: ??? (in
/usr/lib/python2.6/site-packages/numpy/core/multiarray.so)
==10399==    by 0x52365C0: ??? (in
/usr/lib/python2.6/site-packages/numpy/core/multiarray.so)
==10399==    by 0x5236CAE: ??? (in
/usr/lib/python2.6/site-packages/numpy/core/multiarray.so)
==10399##    by 0x5533077: PyCFunction_Call (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x54F280C: PyObject_Call (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x558ED4F: PyEval_EvalFrameEx (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x5590E49: PyEval_EvalCodeEx (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x5590FB3: PyEval_EvalCode (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x55AC25B: ??? (in /usr/lib/libpython2.6.so.1.0)10399##    by 0x55AC322: PyRun_FileExFlags (in /usr/lib/libpython2.6.so.1.0)10399==    by 0x55AD8C0: PyRun_SimpleFileExFlags (in /usr/lib/libpython2.6.so.

2) Linux 64:

a) I have the same issue as before for MPI_DOUBLE.

b) Additionally, MPI_LONG does not seems to do the job. I get this output:

MPI_LONG 8
input:  [1 2 3 4 5]
output: [0 0 0 0 0]
buffer: [ 16  70 115   0   0   0   0   0 192 238 240  79   6  43   0
0 255 255 255 255] 20

-- 
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
mpichbot commented 8 years ago

Originally by goodell on 2009-12-07 14:30:59 -0600


AFAIK, external32 is not fully implemented and it is not surprising that it does not work. We do not have any near-term plans to implement it.

We will update this ticket as the status of external32 support in MPICH2 changes.

mpichbot commented 8 years ago

Originally by robl on 2012-09-21 15:47:31 -0500


Intel contributed external32 support for ROMIO, but it depends on bug-free pack/unpack_external32. I committed a version of it in [fec9f28d0b67a8e6912b1a309b2739a2551b83d2].

hzhou commented 3 years ago

Fixed in #5163