su2code / SU2

SU2: An Open-Source Suite for Multiphysics Simulation and Design
https://su2code.github.io
Other
1.35k stars 844 forks source link

Install SU2 by python wrapper build in singularity container #739

Closed thw1021 closed 5 years ago

thw1021 commented 5 years ago

Dear developers, I met a strange problem when running SU2 in parallel in a docker container. Could you give me some suggestions, please ?

When I run mpirun --allow-run-as-root -n 24 SU2_CFD inv_NACA0012.cfg in the container, there is no flow.dat file. I found a solution at #268. But the output information seems very strange. See SU2_docker_container.log

I also have run SU2 in parallel on the host machine with mpirun -n 24 SU2_CFD inv_NACA0012.cfg. Everything seems fine and I can get flow.dat file without extra actions. The output is SU2_host_machine.log

The outputs of these two cases are very different. Maybe #738 can help a little.

Best.

thw1021 commented 5 years ago

@talbring Could you give me some suggestions, please ? (I don't quite understand mpi and docker, please forgive me for troubling you.)

talbring commented 5 years ago

@thw1021 Like I already said in the comment in the other issue https://github.com/su2code/SU2/issues/738#issuecomment-513870126 : No one can give you support when running OpenMPI in a docker container since it is not officially supported. The only suggestion I have is to use singularity. If you want to test it, install it and you can download su2.sif I created here: https://drive.google.com/open?id=1SaZDloevjj8rFDG2x3Lh05nhTuKHakDK

thw1021 commented 5 years ago

OK. Thank you very much.

thw1021 commented 5 years ago

@talbring Really sorry for troubling you again.

I followed your suggestions to install singularity (3.3.0) and use the su2.sif you shared with me. I run mpirun -n 24 su2.sif SU2_CFD inv_NACA0012.cfg. It failed to work. see the log file. su2.sif.log

The reason should be the OpenMPI version. But I also have installed openmpi-4.0.1 and add

export PATH=$PATH:$HOME/openmpi/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/openmpi/lib

to my .bashrc file. But when I run mpirun --version, the output is

mpirun (Open MPI) 1.10.2

Report bugs to http://www.open-mpi.org/community/help/

The OS on my computer is ubuntu 16.04.

Could you give me some suggestions to solve this problem ? I google for this but failed to find a good way.

Best.

talbring commented 5 years ago

Path is searched in order, so put it like that:

export PATH=$HOME/openmpi/bin:$PATH
export LD_LIBRARY_PATH=$HOME/openmpi/lib:$LD_LIBRARY_PATH
thw1021 commented 5 years ago

@talbring Yes, it worked. Thank you.

However, no flow.dat file when it finished. And it seems that I cannot run SU2_SOL to get flow.dat file.

I have tried with three ways, all failed. Could you give me some suggestions, please ?

hongwei@hongwei-Workstation:~/SU2_RUN/QuickStart$ mpirun -n 24 su2.sif SU2_SOL inv_NACA0012.cfg 

-------------------------------------------------------------------------
|    ___ _   _ ___                                                      |
|   / __| | | |_  )   Release 6.2.0  "Falcon"                           |
|   \__ \ |_| |/ /                                                      |
|   |___/\___//___|   Suite (Solution Exporting Code)                   |
|                                                                       |
-------------------------------------------------------------------------
| The current SU2 release has been coordinated by the                   |
| SU2 International Developers Society <www.su2devsociety.org>          |
| with selected contributions from the open-source community.           |
-------------------------------------------------------------------------
| The main research teams contributing to the current release are:      |
| - Prof. Juan J. Alonso's group at Stanford University.                |
| - Prof. Piero Colonna's group at Delft University of Technology.      |
| - Prof. Nicolas R. Gauger's group at Kaiserslautern U. of Technology. |
| - Prof. Alberto Guardone's group at Polytechnic University of Milan.  |
| - Prof. Rafael Palacios' group at Imperial College London.            |
| - Prof. Vincent Terrapon's group at the University of Liege.          |
| - Prof. Edwin van der Weide's group at the University of Twente.      |
| - Lab. of New Concepts in Aeronautics at Tech. Inst. of Aeronautics.  |
-------------------------------------------------------------------------
| Copyright 2012-2019, Francisco D. Palacios, Thomas D. Economon,       |
|                      Tim Albring, and the SU2 contributors.           |
|                                                                       |
| SU2 is free software; you can redistribute it and/or                  |
| modify it under the terms of the GNU Lesser General Public            |
| License as published by the Free Software Foundation; either          |
| version 2.1 of the License, or (at your option) any later version.    |
|                                                                       |
| SU2 is distributed in the hope that it will be useful,                |
| but WITHOUT ANY WARRANTY; without even the implied warranty of        |
| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU      |
| Lesser General Public License for more details.                       |
|                                                                       |
| You should have received a copy of the GNU Lesser General Public      |
| License along with SU2. If not, see <http://www.gnu.org/licenses/>.   |
-------------------------------------------------------------------------

------------------------ Physical Case Definition -----------------------
Input mesh file name: mesh_NACA0012_inv.su2

-------------------------- Output Information ---------------------------
The output file format is Tecplot ASCII (.dat).
Flow variables file name: flow.

------------------- Config File Boundary Information --------------------
+-----------------------------------------+
|         Marker Type|         Marker Name|
+-----------------------------------------+
|          Euler wall|             airfoil|
+-----------------------------------------+
|           Far-field|            farfield|
+-----------------------------------------+

---------------------- Read Grid File Information -----------------------
Two dimensional problem.
5233 points before parallel partitioning.
Performing linear partitioning of the grid nodes.
10216 interior elements before parallel partitioning.
Executing the partitioning functions.
Building the graph adjacency structure.
Distributing elements across all ranks.
2 surface markers.
+------------------------------------+
| Index|        Marker|      Elements|
+------------------------------------+
|     0|       airfoil|           200|
|     1|      farfield|            50|
+------------------------------------+
Calling ParMETIS... graph partitioning complete (1114 edge cuts).
Distributing ParMETIS coloring.
Rebalancing vertices.
Rebalancing volume element connectivity.
Rebalancing markers and surface elements.
6403 vertices including ghost points. 
11338 interior elements including halo cells. 
11338 triangles.
Establishing MPI communication patterns.
Identify vertices.
Storing a mapping from global to local point index.

------------------------- Solution Postprocessing -----------------------

Error in "void CBaselineSolver::SetOutputVariables(CGeometry*, CConfig*)": 
-------------------------------------------------------------------------
Unable to open SU2 restart file solution_flow.dat
------------------------------ Error Exit -------------------------------

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 17 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[hongwei-Workstation:07803] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2079
[hongwei-Workstation:07803] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2079
[hongwei-Workstation:07803] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2079
[hongwei-Workstation:07803] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2079
[hongwei-Workstation:07803] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2079
[hongwei-Workstation:07803] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2079
[hongwei-Workstation:07803] 23 more processes have sent help message help-mpi-api.txt / mpi-abort
[hongwei-Workstation:07803] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
hongwei@hongwei-Workstation:~/SU2_RUN/QuickStart$ singularity exec su2.sif SU2_SOL inv_NACA0012.cfg 
/.singularity.d/actions/exec: 9: exec: SU2_SOL: not found
hongwei@hongwei-Workstation:~/SU2_RUN/QuickStart$ singularity shell su2.sif 
Singularity su2.sif:~/SU2_RUN/QuickStart> SU2_SOL inv_NACA0012.cfg 
bash: SU2_SOL: command not found
Singularity su2.sif:~/SU2_RUN/QuickStart>
thw1021 commented 5 years ago

I found your previous comments :

 %runscript
      exec /SU2/bin/$1 $2

So I run singularity exec su2.sif /SU2/bin/SU2_SOL inv_NACA0012.cfg, but still failed. SU2_CFD can run successfully in this way. So why ?

hongwei@hongwei-Workstation:~/SU2_RUN/QuickStart$ singularity exec su2.sif /SU2/bin/SU2_SOL inv_NACA0012.cfg 

-------------------------------------------------------------------------
|    ___ _   _ ___                                                      |
|   / __| | | |_  )   Release 6.2.0  "Falcon"                           |
|   \__ \ |_| |/ /                                                      |
|   |___/\___//___|   Suite (Solution Exporting Code)                   |
|                                                                       |
-------------------------------------------------------------------------
| The current SU2 release has been coordinated by the                   |
| SU2 International Developers Society <www.su2devsociety.org>          |
| with selected contributions from the open-source community.           |
-------------------------------------------------------------------------
| The main research teams contributing to the current release are:      |
| - Prof. Juan J. Alonso's group at Stanford University.                |
| - Prof. Piero Colonna's group at Delft University of Technology.      |
| - Prof. Nicolas R. Gauger's group at Kaiserslautern U. of Technology. |
| - Prof. Alberto Guardone's group at Polytechnic University of Milan.  |
| - Prof. Rafael Palacios' group at Imperial College London.            |
| - Prof. Vincent Terrapon's group at the University of Liege.          |
| - Prof. Edwin van der Weide's group at the University of Twente.      |
| - Lab. of New Concepts in Aeronautics at Tech. Inst. of Aeronautics.  |
-------------------------------------------------------------------------
| Copyright 2012-2019, Francisco D. Palacios, Thomas D. Economon,       |
|                      Tim Albring, and the SU2 contributors.           |
|                                                                       |
| SU2 is free software; you can redistribute it and/or                  |
| modify it under the terms of the GNU Lesser General Public            |
| License as published by the Free Software Foundation; either          |
| version 2.1 of the License, or (at your option) any later version.    |
|                                                                       |
| SU2 is distributed in the hope that it will be useful,                |
| but WITHOUT ANY WARRANTY; without even the implied warranty of        |
| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU      |
| Lesser General Public License for more details.                       |
|                                                                       |
| You should have received a copy of the GNU Lesser General Public      |
| License along with SU2. If not, see <http://www.gnu.org/licenses/>.   |
-------------------------------------------------------------------------

------------------------ Physical Case Definition -----------------------
Input mesh file name: mesh_NACA0012_inv.su2

-------------------------- Output Information ---------------------------
The output file format is Tecplot ASCII (.dat).
Flow variables file name: flow.

------------------- Config File Boundary Information --------------------
+-----------------------------------------+
|         Marker Type|         Marker Name|
+-----------------------------------------+
|          Euler wall|             airfoil|
+-----------------------------------------+
|           Far-field|            farfield|
+-----------------------------------------+

---------------------- Read Grid File Information -----------------------
Two dimensional problem.
5233 points.
2 surface markers.
+------------------------------------+
| Index|        Marker|      Elements|
+------------------------------------+
|     0|       airfoil|           200|
|     1|      farfield|            50|
+------------------------------------+
10216 triangles.
Identify vertices.
Storing a mapping from global to local point index.

------------------------- Solution Postprocessing -----------------------

Error in "void CBaselineSolver::SetOutputVariables(CGeometry*, CConfig*)": 
-------------------------------------------------------------------------
Unable to open SU2 restart file solution_flow.dat
------------------------------ Error Exit -------------------------------

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
thw1021 commented 5 years ago

I have to apologize for some mistakes I have made when running the commands. It actually worked. Thank you @talbring .

thw1021 commented 5 years ago

@talbring Thanks for your help. I want to install SU2 by python wrapper build. So I write a definition file based on yours. However, some errors happened. The reason seems to be python environment.

Sorry for troubling you. Could you give me some suggestions, please ?

Best.

Here is my definition file.

Bootstrap: docker
From: ubuntu:18.04

%post
    apt-get -y update
    apt-get -y upgrade
    apt-get -y install python3 python3-pip git build-essential autoconf openmpi-bin openmpi-common libopenmpi-dev m4 gfortran swig vim
    pip3 install mpi4py numpy scipy matplotlib
    git clone --depth=1 https://github.com/su2code/SU2
    cd SU2
    mkdir SU2_Install
    autoreconf -i
    ./bootstrap
    export CXXFLAGS="-O3 -Wall"
    python3 preconfigure.py --enable-autodiff --enable-mpi --enable-PY_WRAPPER --with-cc=/usr/bin/mpicc --with-cxx=/usr/bin/mpicxx --prefix=$PWD/SU2_Install
    make -j 4 install
    make clean
    cd ..
    pip3 install tensorforce[tf]
    git clone https://github.com/tensorforce/tensorforce.git
    cd tensorforce/
    git checkout major-revision-final
    pip3 install -e .

%runscript
    exec /SU2/bin/$1 $2    

The error is:

make[3]: Entering directory '/SU2/SU2_BASE/SU2_PY/pySU2'
/bin/bash: python: command not found
swig -DHAVE_MPI  -Wall -I/usr/include/python3.6m -I/usr/include/python3.6m -I/root/.local/lib/python2.7/site-packages/mpi4py/include -I/mpi4py/include -I/Library/Python/2.7/site-packages/mpi4py/include -outdir ./ -o SU2_APIPYTHON_wrap.cxx -c++ -python /SU2/SU2_BASE/../SU2_PY/pySU2/pySU2.i 
/SU2/SU2_BASE/../SU2_PY/pySU2/pySU2.i:64: Error: Unable to find 'mpi4py/mpi4py.i'
Makefile:532: recipe for target 'SU2_APIPYTHON_wrap.cxx' failed
make[3]: *** [SU2_APIPYTHON_wrap.cxx] Error 1
make[3]: Leaving directory '/SU2/SU2_BASE/SU2_PY/pySU2'
Makefile:525: recipe for target 'all' failed
make[2]: *** [all] Error 2
make[2]: Leaving directory '/SU2/SU2_BASE/SU2_PY/pySU2'
Makefile:441: recipe for target 'install-recursive' failed
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory '/SU2/SU2_BASE'
Makefile:13: recipe for target 'install-SU2_BASE' failed
make: *** [install-SU2_BASE] Error 2
FATAL:   failed to execute %post proc: exit status 2
FATAL:   While performing build: while running engine: exit status 255
talbring commented 5 years ago

/SU2/SU2_BASE/../SU2_PY/pySU2/pySU2.i:64: Error: Unable to find 'mpi4py/mpi4py.i'

Use pip to install mpi4py.

PS: just saw you already did that, sorry.

thw1021 commented 5 years ago

Thank you. But if use pip to install mpi4py, will it have some negative effects if I use python3 combining with SU2 for further research ?

thw1021 commented 5 years ago

I build the image using following definition

Bootstrap: docker
From: ubuntu:18.04

%post
    apt-get -y update
    apt-get -y upgrade
    apt-get -y install python3 python3-pip python-dev python-pip git build-essential autoconf openmpi-bin openmpi-common libopenmpi-dev m4 gfortran swig vim
    pip3 install mpi4py numpy scipy matplotlib
    pip install mpi4py numpy scipy matplotlib
    git clone --depth=1 https://github.com/su2code/SU2
    cd SU2
    mkdir SU2_Install
    autoreconf -i
    ./bootstrap
    export CXXFLAGS="-O3 -Wall"
    python3 preconfigure.py --enable-autodiff --enable-mpi --enable-PY_WRAPPER --with-cc=/usr/bin/mpicc --with-cxx=/usr/bin/mpicxx --prefix=$PWD/SU2_Install
    make -j 4 install
    make clean
    cd ..
    pip3 install tensorforce[tf]
    git clone https://github.com/tensorforce/tensorforce.git
    cd tensorforce/
    git checkout major-revision-final
    pip3 install -e .

%runscript
    exec /SU2/bin/$1 $2

But it cannot run

ubuntu@main-3:~/main_shared_volume/build_singularity_image/QuickStart$ singularity exec su2_tensorforce.sif /SU2/bin/SU2_CFD inv_NACA0012.cfg  
/.singularity.d/actions/exec: 9: exec: /SU2/bin/SU2_CFD: not found
stephansmit commented 5 years ago

You are installing it in the folder SU2_install/ according to "--prefix=$PWD/SU2_Install" So i think your last line should be: exec /SU2_Install/bin/$1 $2

However I have no experience with this singularity so i could be wrong.

thw1021 commented 5 years ago

Oh, I see. :sweat_smile: You should be right.

thw1021 commented 5 years ago

But based on my own experience, I have to use pip (python2) to install mpi4py so that I can build the image successfully. I want to know if I use python3 for further development, will it be OK ?

stephansmit commented 5 years ago

You will need to install it for python3 if you plan to use that. So use: pip3 install mpi4py or python3 -m pip install mpi4py

thw1021 commented 5 years ago

Yes, I have done it like you said. But it failed. See https://github.com/su2code/SU2/issues/739#issuecomment-515298427

thw1021 commented 5 years ago

@clarkpede offered a method some days ago, but I am not sure how to edit the Makefile. https://github.com/su2code/SU2/issues/722#issuecomment-506710295

talbring commented 5 years ago

Made it work with this:

Bootstrap: docker
From: ubuntu:19.04

%post
    apt-get -y update
    apt-get -y install python3 python3-pip git build-essential autoconf python3-dev libopenmpi3 openmpi-common swig
    ln -s /usr/bin/python3 /usr/bin/python
    python --version
    pip3 install mpi4py numpy scipy
    git clone --depth=1 https://github.com/su2code/SU2
    cd SU2
    autoreconf -i
    export CXXFLAGS="-O3"
    python preconfigure.py --enable-mpi --enable-PY_WRAPPER --prefix=$PWD
    make install -j20
    make clean

%runscript
    exec /SU2/bin/$1 $2
thw1021 commented 5 years ago

OK. Thank you. I will try it now.

thw1021 commented 5 years ago

It's running. A small problem is that I have to change ubuntu:19.04 to ubuntu:18.04 and change libopenmpi3 to libopenmpi-dev openmpi-bin in the definition, or it will fail.

ubuntu@main-3:~/main_shared_volume/build_singularity_image/builid_image$ sudo singularity build su2_tensorforce.sif su2_tensorforce.def 
INFO:    Starting build...
Getting image source signatures
Skipping fetch of repeat blob sha256:1eecd0e4c2cd8c1f628b81c53a487aae6c8d4140248a8617313cd73079be09c4
Skipping fetch of repeat blob sha256:fac13afdf65bf403945c8d6bee654a26940c5527a9913bdf8daa54b69a49f550
Skipping fetch of repeat blob sha256:0c6dd534ddf18642a5af19c09c2d9744d0d1aa93680995d430b5257b6eed079d
Skipping fetch of repeat blob sha256:854703cff8700dce5b5ff70e54f5d612ab701405bc200a5b10a0213ca9025e50
Copying config sha256:993d5f573a24af19dd6006bc3e6e113bd0c709797dc48676f4f0b5ed456470cc
 2.42 KiB / 2.42 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
singularity image-build: relocation error: /lib/x86_64-linux-gnu/libnss_files.so.2: symbol __libc_readline_unlocked version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference
FATAL:   While performing build: while running engine: exit status 127

My OS is ubuntu 18.04, and OpenMPI version is 2.1.1. I will take a test to see the reason.

Once if finishes, I will let you know. Thank you.

thw1021 commented 5 years ago

Yes. Now I can build and run the image. Thank you @talbring @stephansmit .