precice / python-bindings

Python language bindings for preCICE
https://precice.org
GNU Lesser General Public License v3.0
22 stars 12 forks source link

How do we want to deal with the mpi4py dependency? #8

Open BenjaminRodenberg opened 4 years ago

BenjaminRodenberg commented 4 years ago

Continue closed PR https://github.com/precice/precice/pull/312 here.

BenjaminRodenberg commented 4 years ago

I did some history research:

In https://github.com/precice/precice/pull/299 we observed that mpi4py was missing in the python solverdummy. In https://github.com/precice/precice/pull/316, we decided to move it directly into the bindings and remove it from the solverdummy.

uekerman commented 4 years ago

There is also https://github.com/precice/precice/issues/311

BenjaminRodenberg commented 4 years ago

I looked a bit more into history and found the original reason for adding the statement from mpi4py import MPI here.

I would suggest to first reproduce this kind of behaviour in a test in order to make sure that it's worth all the trouble.

BenjaminRodenberg commented 4 years ago

edit Update to new location of solverdummy and preCICE v2.0.0

In the following I will provide a description how to provoke the "mpi4py" error mentioned in https://github.com/precice/precice/pull/299#issuecomment-469421697

Preparations

install preCICE

Use preCICE revision https://github.com/precice/precice/commit/9f778290416416255fc73a495e962def301648b0 preCICE v2.0.0

Build and install preCICE via

mkdir build
cd build
cmake -DBUILD_SHARD_LIBS=ON -DPRECICE_PETScMapping=OFF -DPRECICE_MPICommunication=<ON|OFF>.
make -j4
sudo make install

Install python bindings

Use python bindings revision https://github.com/precice/python-bindings/commit/7ddf2894644bb596e3ddbf772059ed98ab61b5ed python-bindings revision https://github.com/precice/python-bindings/pull/36/commits/3ad6d0ee0eb29d29991095f9d3fa85bdefa671f0 and remove this line

Install bindings via pip3 install --user .

Run solverdummy

Navigate to cd precice/tools/solverdummies cd python-bindings/solverdummy

Run ~python3 python/solverdummy.py precice-config.xml SolverOne MeshOne and python3 python/solverdummy.py precice-config.xml SolverTwo MeshTwo~ python3 solverdummy/solverdummy.py solverdummy/precice-config.xml SolverOne MeshOne and python3 solverdummy/solverdummy.py solverdummy/precice-config.xml SolverTwo MeshTwo

Outcome

  1. If -DPRECICE_MPICommunication=OFF, everything works as expected
  2. If -DPRECICE_MPICommunication=ON, we get the following error:
~/precice/tools/solverdummies$ python3 python/solverdummy.py precice-config.xml SolverOne MeshOne
[2020-01-16 21:18:51.319269] [0x00007f78ae21db80] [trace]   Entering operator()
[2020-01-16 21:18:51.319303] [0x00007f78ae21db80] [debug]   Initialize MPI
[benjamin-ThinkPad-X1-Yoga-2nd:12451] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[benjamin-ThinkPad-X1-Yoga-2nd:12451] mca_base_component_repository_open: unable to open mca_shmem_mmap: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[benjamin-ThinkPad-X1-Yoga-2nd:12451] mca_base_component_repository_open: unable to open mca_shmem_posix: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[benjamin-ThinkPad-X1-Yoga-2nd:12451] mca_base_component_repository_open: unable to open mca_shmem_sysv: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[benjamin-ThinkPad-X1-Yoga-2nd:12451] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
  1. If -DPRECICE_MPICommunication=ON, but we add the line from mpi4py import MPI before or after import precice in solverdummy.py, everything works.
BenjaminRodenberg commented 4 years ago

I checked the Error described in https://github.com/precice/python-bindings/issues/8#issuecomment-575329609 for the code provided in #36 . The error still persists.

BenjaminRodenberg commented 3 years ago

Another idea that might help us closing this issue: preCICE allows to check whether it was compiled with MPI or not through SolverInterface::getVersionInformation. This might be a good way to determine whether mpi (or mpi4py) is needed or not. Then we can drop mpi4py as a mandatory dependency and depending on what SolverInterface::getVersionInformation returns raise a warning (or error), if mpi4py is needed, but cannot be found.