Error when running tests with MPI

maurorigo commented 1 year ago

Hello, I'm trying to run tests for the library with MPI on a cluster. Running python runtests.py --single --pdb --capture=no (this happens also for multiple ranks), when testing nbkit I get this error:

build/testenv/lib/python3.8/site-packages/fastpm/tests/test_nbkit.py F
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> captured log >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
DEBUG    h5py._conv:__init__.py:47 Creating converter from 7 to 5
DEBUG    h5py._conv:__init__.py:47 Creating converter from 5 to 7
DEBUG    h5py._conv:__init__.py:47 Creating converter from 7 to 5
DEBUG    h5py._conv:__init__.py:47 Creating converter from 5 to 7
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

size = 1, args = (), comm = <mpi4py.MPI.Intracomm object at 0x7f55aa7f8350>
color = 0

    @pytest.mark.parametrize("size", sizes)
    def wrapped(size, *args):
        func_names = MPI.COMM_WORLD.allgather(func.__name__)
        if not all(func_names[0] == i for i in func_names):
            raise RuntimeError("function calls mismatched", func_names)

        try:
            comm, color = create_comm(size)
        except WorldTooSmall:
            return pytest.skip("Test skipped because world is too small. Include the test with mpirun -n %d" % (size))

        try:
            if color == 0:
>               rt = func(*args, comm=comm)

/home/mrigo/miniconda/envs/LDLenv/lib/python3.8/site-packages/runtests/mpi/tester.py:139:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/mrigo/.local/lib/python3.8/site-packages/numpy/testing/_private/decorators.py:171: in skipper_func
    return f(*args, **kwargs)
/home/mrigo/LDLproject/fastpm-python/build/testenv/lib/python3.8/site-packages/fastpm/tests/test_nbkit.py:24: in test_nbkit
    sim = FastPMCatalogSource(linear, boost=2, Nsteps=5, cosmo=cosmo)
/home/mrigo/LDLproject/fastpm-python/build/testenv/lib/python3.8/site-packages/fastpm/nbkit.py:35: in __init__
    dlin = self.linear.to_field(mode='complex')
/home/mrigo/miniconda/envs/LDLenv/lib/python3.8/site-packages/nbodykit/base/mesh.py:233: in to_field
    complex = self.to_complex_field()
/home/mrigo/miniconda/envs/LDLenv/lib/python3.8/site-packages/nbodykit/source/mesh/linear.py:91: in to_complex_field
    complex.apply(filter)
/home/mrigo/miniconda/envs/LDLenv/lib/python3.8/site-packages/pmesh/pm.py:1070: in apply
    return Field.apply(self, func, kind, out)
/home/mrigo/miniconda/envs/LDLenv/lib/python3.8/site-packages/pmesh/pm.py:642: in apply
    oslab[...] = func(k, islab)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

k = [array([[0.]], dtype=float32), array([[ 0.        ],
       [ 0.02454369],
       [ 0.04908739],
       [ 0.07363108],... ,  0.638136  ,  0.66267973,  0.68722343,  0.7117671 ,
         0.7363108 ,  0.7608545 , -0.7853982 ]], dtype=float32)]
v = slab([[ 0.        +0.0000000e+00j, -0.00179628+2.8299162e-02j,
        0.00607932-2.7621517e-02j, ...,  0.00118724+1.5......,  0.00075459-1.3397790e-03j,
       -0.00127713+5.8607967e-04j, -0.00046806+6.1795174e-04j]],
     dtype=complex64)

    def filter(k, v):
>       mask = numpy.bitwise_and.reduce([ki == 0 for ki in k])
E       ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

/home/mrigo/miniconda/envs/LDLenv/lib/python3.8/site-packages/nbodykit/source/mesh/linear.py:87: ValueError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /home/mrigo/miniconda/envs/LDLenv/lib/python3.8/site-packages/nbodykit/source/mesh/linear.py(87)filter()
-> mask = numpy.bitwise_and.reduce([ki == 0 for ki in k])
(Pdb)

Does anybody have an idea why this happens?

maurorigo commented 1 year ago

Fixed, downgrading NumPy from 1.24 to 1.21 fixed the issue (although it should work also with up to at least 1.23.5)

rainwoodman commented 1 year ago

Thank you! It sounds like you've tracked down the release notes item related to this. Do you know the recommended way to fix this error for 1.24? I feel we shall fix it before the knowledge and tooling is forgotten.

maurorigo commented 1 year ago

In previous versions (at least since 1.21.0, I didn't check in earlier versions), there was a VisibleDeprecationWarning (VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.) Apparently in 1.24.0 they removed the deprecation warning (https://numpy.org/doc/stable/release/1.24.0-notes.html#expired-deprecations) and it seems that, instead of mask = numpy.bitwise_and.reduce([ki == 0 for ki in k]), mask = numpy.bitwise_and.reduce(numpy.array([ki == 0 for ki in k], dtype=object)) does the trick, at least from 1.21.0 up to the most recent version. Here's some code that replicates what happens during the testing of nbkit:

import numpy as np

k = [np.array([[0.]], dtype=np.float32), np.array([[ 0.        ], [ 0.02454369], [ 0.04908739], [ 0.07363108],\
                                                   [ 0.09817477], [ 0.12271847], [ 0.14726216], [ 0.17180586],\
                                                   [ 0.19634955], [ 0.22089323], [ 0.24543694], [ 0.26998064],\
                                                   [ 0.2945243 ], [ 0.319068  ], [ 0.34361172], [ 0.3681554 ],\
                                                   [ 0.3926991 ], [ 0.4172428 ], [ 0.44178647], [ 0.46633017],\
                                                   [ 0.49087387], [ 0.5154176 ], [ 0.5399613 ], [ 0.5645049 ],\
                                                   [ 0.5890486 ], [ 0.6135923 ], [ 0.638136  ], [ 0.66267973],\
                                                   [ 0.68722343], [ 0.7117671 ], [ 0.7363108 ], [ 0.7608545 ],\
                                                   [-0.7853982 ], [-0.7608545 ], [-0.7363108 ], [-0.7117671 ],\
                                                   [-0.68722343], [-0.66267973], [-0.638136  ], [-0.6135923 ],\
                                                    [-0.5890486 ], [-0.5645049 ], [-0.5399613 ], [-0.5154176 ],\
                                                    [-0.49087387], [-0.46633017], [-0.44178647], [-0.4172428 ],\
                                                     [-0.3926991 ], [-0.3681554 ], [-0.34361172], [-0.319068  ],\
                                                      [-0.2945243 ], [-0.26998064], [-0.24543694], [-0.22089323],\
                                                       [-0.19634955], [-0.17180586], [-0.14726216], [-0.12271847], [-0.09817477],\
                                                        [-0.07363108], [-0.04908739], [-0.02454369]], dtype=np.float32),\
     np.array([[ 0.        ,  0.02454369,  0.04908739,  0.07363108,  0.09817477,\
         0.12271847,  0.14726216,  0.17180586,  0.19634955,  0.22089323,\
         0.24543694,  0.26998064,  0.2945243 ,  0.319068  ,  0.34361172,\
         0.3681554 ,  0.3926991 ,  0.4172428 ,  0.44178647,  0.46633017,\
         0.49087387,  0.5154176 ,  0.5399613 ,  0.5645049 ,  0.5890486 ,\
         0.6135923 ,  0.638136  ,  0.66267973,  0.68722343,  0.7117671 ,\
         0.7363108 ,  0.7608545 , -0.7853982 ]], dtype=np.float32)]

print(k)

print([np.shape(ki) for ki in k])
mask = np.bitwise_and.reduce([ki == 0 for ki in k]) # Raises a deprecation warning up to 1.23.5, gives value error above
mask = np.bitwise_and.reduce(np.array([ki == 0 for ki in k], dtype=object)) # Works at least from 1.21.0
print(mask, mask.shape)

rainwoodman / fastpm-python

Error when running tests with MPI #18