Error running perpendicular-flap/fluid-openfoam in parallel #218

Open efirvida opened 3 years ago

efirvida commented 3 years ago

Hi, I'm trying to run this tutorial in parallel just using ` -parallel but always fail. I tried several configurations of decomposeParDict until I found that it only runs in parallel if I use this setting....

numberOfSubdomains 2;

method          simple;

    n               (2 1 1);
    delta           0.001;

I'm running the preCICE adapters built with EasyBuild and the easyconfigs that I have made see it here:, So I'm really don't know if I have a mistake in my easyconfigs or is a tutorial error.

I have plans to submit the easyconfigs to the main EasyBuild repo but to do it I have to be sure that they work, and then follow my research on FSI.

Another thing that may be important to say is that I'm using Fedora 34 and I have some problems building the foss-2020a toolchain due to Binutils 2.34 bug ( and I change the version of the Binutils to 2.36.1 and Bison to 3.7.6 to the whole toolchain, and that's the main reason of my branch here, I don't know if this introduces some bugs to the library.

---[preciceAdapter] Loaded the OpenFOAM-preCICE adapter v1.0.0.
---[preciceAdapter] Reading preciceDict...
---[precice]  This is preCICE version 2.2.1
---[precice]  Revision info: no-info [Git failed/Not a repository]
---[precice]  Configuration: Release (Debug and Trace log unavailable)
---[precice]  Configuring preCICE with configuration "../precice-config.xml"
---[precice]  I am participant "Fluid"
---[precice]  Connecting Master to 3 Slaves
[2]PETSC ERROR: ------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see
[2]PETSC ERROR: or try on GNU/linux and Apple Mac OS X to find memory corruption errors
[2]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[2]PETSC ERROR: to get more information on the crash.
[2]PETSC ERROR: User provided function() line 0 in  unknown file  
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
---[precice]  Setting up master communication to coupling partner/s
MakisH commented 3 years ago

I can reproduce this with OpenFOAM v2012 (installed from .deb) on Ubuntu 21.04, with preCICE v2.2.1 (built from source). My system has only two physical cores, and I use export OMPI_MCA_rmaps_base_oversubscribe=1 in my ~/.bashrc.

It does not seem to matter if the interface is "cut" by the parallel boundary:

Since people have used the OpenFOAM adapter with more ranks and since we have also ran e.g. the turek-hron-fsi3 case with 25 ranks, this should be specific to the tutorial or the system.

@efirvida how many physical & logical cores do you have on your system?

efirvida commented 3 years ago

@MakisH I'm running on a laptop with a i7-8650U, so I have 4 cores with 2 threads each, I test the old version of the tutorial rolling back the repository to the commit 5f4031fc7e45807dca787a525569b39a1909d2a3, and it works fine. I use the -oversubscribe too and testit up to 12 partitions, I haven't time to compare the tutorials to see what's different, and also I haven't much experience with preCICE yet, but the old version didn't fail on any of my tests.

davidscn commented 3 years ago

I think the crucial factor here is whether the master rank of OpenFOAM owns interface nodes or not. IIRC I had already a similar issue in the past. I'm still a bit puzzled whether the issue is triggered from the OpenFOAM side or from the preCICE side. I have some cases to test.. a workaround should still be given by this approach .

davidscn commented 3 years ago

I think it is an issue in the adapter rather than preCICE. Some corner cases with empty master ranks were fixed in the preCICE bugfix release v2.2.1. and IIRC I already ran empty master cases with other solver. I need to build the adapter in debug mode (CXX_FLAG='-g') to get more information here:

davidscn commented 3 years ago

I can confirm that I can successfully run test cases (no OpenFOAM) where the master rank is not located at the interface.