hiddenSymmetries / simsopt

Simons Stellarator Optimizer Code
https://simsopt.readthedocs.io
MIT License
97 stars 47 forks source link

NERSC perlmutter shifter container: importing mpi4py in 'hiddensymmetries/simsopt:latest' results in fatal errors. #455

Closed daringli closed 15 hours ago

daringli commented 1 day ago

Possibly related to issue #395.

On 'latest', when importing mpi4py, I get the following fatal error

Traceback (most recent call last):
  File "/global/u2/s/sbuller/test_MPI_singularity/test.py", line 1, in <module>
    from mpi4py import MPI
ImportError: /venv/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so: undefined symbol: MPI_Neighbor_alltoallv_c

This occurs in optimization scripts, but also in a minimal working example consisting solely of from mpi4py import MPI when called from a job script:

#!/bin/bash -l
#SBATCH --time=00:01:00
#SBATCH --qos=debug
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=128
#SBATCH --cpus-per-task=1
#SBATCH --constraint=cpu
#SBATCH --image=hiddensymmetries/simsopt:latest
#SBATCH --account=m4529
#SBATCH -o out.%j
#SBATCH -e err.%j

module list
echo $LD_LIBRARY_PATH
time srun shifter /venv/bin/python test.py

where 'test.py' just contains from mpi4py import MPI. I've attached the needed files in a zip (although you'd need to edit the 'account' field in the job script to run it).

The error does not occur with image=hiddensymmetries/simsopt:v1.6.4.

Zip with minimal example: test_MPI_shifter.zip

landreman commented 1 day ago

I just had exactly this same issue with a separate project, so can comment. mpi4py recently released version 4.0, and the problem started then. Forcing mpi4py<4 when pip-installing mpi4py in the Dockerfile resolves the problem. I notified NERSC of the issue.

landreman commented 15 hours ago

Resolved by #456