underworldcode / UWGeodynamics

Underworld Geodynamics
Other
81 stars 32 forks source link

how to choose the number of processes when running with MPI #155

Closed zhongpenggeo closed 4 years ago

zhongpenggeo commented 4 years ago

Hi @rbeucher,

I don't know how to choose the best number of processes when running, most of time I just can use 12 like this:

mpirun -np 12 python filename.py

if the number is larger than 12, sometimes myabe larger than 5, it doesn't work. I don't know why, and actually there are 64 cores in my server. the error information is:

jovyan@0bb08fa3855a:/workspace/My_test$ mpirun -np 16 python 2D_continental_accretion_multylayers.py
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max

=====================================================================================
Error running Underworld - Signal 2 'SIGINT' (Termination Request).
Isn't it wonderbubble to have CTRL-C?
no mem for new parser
An uncaught exception was encountered on processor 7.An uncaught exception was encountered on processor 10.
Traceback (most recent call last):
  File "2D_continental_accretion_multylayers.py", line 15, in <module>
    import UWGeodynamics as GEO
  File "/opt/UWGeodynamics/UWGeodynamics/__init__.py", line 26, in <module>

Traceback (most recent call last):
  File "2D_continental_accretion_multylayers.py", line 15, in <module>
    import UWGeodynamics as GEO
  File "/opt/UWGeodynamics/UWGeodynamics/__init__.py", line 26, in <module>
        from .LecodeIsostasy import LecodeIsostasy
  File "/opt/UWGeodynamics/UWGeodynamics/LecodeIsostasy/__init__.py", line 2, in <module>
    from .LecodeIsostasy import LecodeIsostasy
  File "/opt/UWGeodynamics/UWGeodynamics/LecodeIsostasy/LecodeIsostasy.py", line 5, in <module>
    from scipy.interpolate import interp1d, interp2d
  File "/usr/local/lib/python2.7/dist-packages/scipy/interpolate/__init__.py", line 175, in <module>

=====================================================================================
Error running Underworld - Signal 11 'SIGSEGV' (Segmentation Fault).
This is probably caused by an illegal access of memory.
We recommend running the code in a debugger to work out where the problem is (e.g. 'gdb')
and also to contact the developers.
    from .LecodeIsostasy import LecodeIsostasy
  File "/opt/UWGeodynamics/UWGeodynamics/LecodeIsostasy/__init__.py", line 2, in <module>
    from .LecodeIsostasy import LecodeIsostasy
  File "/opt/UWGeodynamics/UWGeodynamics/LecodeIsostasy/LecodeIsostasy.py", line 5, in <module>
    from scipy.interpolate import interp1d, interp2d
  File "/usr/local/lib/python2.7/dist-packages/scipy/interpolate/__init__.py", line 175, in <module>
from .interpolate import *
    from .interpolate import *
  File "/usr/local/lib/python2.7/dist-packages/scipy/interpolate/interpolate.py", line 20, in <module>
    import scipy.linalg
  File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/__init__.py", line 216, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scipy/interpolate/interpolate.py", line 20, in <module>
    register_func(k, eval(k))
MemoryError
    import scipy.linalg
  File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/__init__.py", line 207, in <module>
application called MPI_Abort(comm=0x84000002, 1) - process 10

=====================================================================================
Error running Underworld - Signal 11 'SIGSEGV' (Segmentation Fault).
This is probably caused by an illegal access of memory.
We recommend running the code in a debugger to work out where the problem is (e.g. 'gdb')
and also to contact the developers.
    from ._decomp_update import *
ImportError: /usr/local/lib/python2.7/dist-packages/scipy/linalg/_decomp_update.so: failed to map segment from shared object
application called MPI_Abort(comm=0x84000002, 1) - process 7

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 20163 RUNNING AT 0bb08fa3855a
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

and this is the resources used in server when using 12 processes: image

rbeucher commented 4 years ago

Hi, Sorry for the late reply.

The number of processors that you can use depends on the number of elements you have in your mesh. The mesh is divided into sub-domains, each of them being sent to a processor. If the domain is too small the solver can't do much.

Can you tell me a bot more about what you are doing? Size of the mesh etc.

Romain

rbeucher commented 4 years ago

I can see that you are using docker. Please look at your docker configuration and check that docker can actually use the CPUs available on your system.

rbeucher commented 4 years ago

I am closing this. This is very pb dependent as I said.