neperfepx / neper

Polycrystal generation and meshing
http://neper.info
GNU General Public License v3.0
204 stars 53 forks source link

Multi threading performance of meshing module drops off past 16 threads #43

Open jcappola opened 4 years ago

jcappola commented 4 years ago

I've noticed that the meshing module seems to slow down over time while meshing large domains and have also found that there seems to be a functional limit for multi threading in the meshing module. Benchmarking tests for varying mesh densities were performed on a Dell Precision 7820 running Ubuntu 18.04 LTS with Intel Xeon Platinum 8260 CPU @ 2.40GHz x 96. Neper-3.5.1 was utilized with gmsh-3.0.6 compiled from source. The tests consisted of 3 different meshes with 10, 100, and 1000 grains, respectively, and the number of threads was increased from 1 to 96 in increasing powers of two.

The tessellation/meshing commands used to generate the domain were:

neper -T -n XX -id 1 -reg 1 -mloop 4 -domain "cube(1,1,1)"
neper -M nXX-id1.tess -order 2 -o nXX-id1 -format msh

normalized_time_to_mesh_dualcpu

Above is a plot showing the runtime for the benchmark tests normalized by the serial runtime. Note that after 16 threads, the runtime generally begins increasing at varying rates which is consistent across the 3 orders of magnitude tested.

babakrav commented 4 years ago

Is mloop the command for requesting multithread?

jcappola commented 4 years ago

@babakrav: The -mloop flag specifies the number of regularization loops that are to be performed. This deletes any small edges and generally cleans up the mesh. Multithreading is utilized by compiling Neper with OpenMP and affects the number of threads a tessellation can be meshed on.

babakrav commented 4 years ago

Thanks jcappola for your response. I am new to this topic, can elaborate a little on what exactly you mean to compile Neper with OpenMP. On our University's cluster, I know both Neper and OpenMP are available and I am able to run Neper, but I have no idea how to complie Neper with OpenMP.

jcappola commented 4 years ago

Sure. If you run Neper from the command line you will get a header printed like so:

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 3.5.3-115-neper-fepx
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt|libscotch (full)
Info   : Running on 12 threads.
Info   : <http://neper.info>
Info   : Copyright (C) 2003-2020, and GNU GPL'd, by Romain Quey.
========================================================================

If you see openmp in the Built with: line then you have compiled Neper with OpenMP. If this is missing then when you are using ccmake .. to configure your Neper build via CMake, you need to make sure that the flag HAVE_OPENMP is set to ON. CMake should automatically find your installation of OpenMP and link the packages accordingly.

babakrav commented 4 years ago

This is the message appears for me:

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 3.5.1
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt|libscotch (full)
Info   : Running on 1 threads.
Info   : <http://neper.info>
Info   : Copyright (C) 2003-2019, and GNU GPL'd, by Romain Quey.
========================================================================

So I assume Neper is already linked with OpenMP but, it only uses 1 thread. Is there any other way to ask Neper to get run with multiple threads other than installing it again?

jcappola commented 4 years ago

Yeah looks good. If you are running on a local workstation in your console you can input export OMP_NUM_THREADS=XX where XX is the number of threads you would like to run with. If you are running Neper on a cluster that uses a job scheduling program (SLURM or others) you should be sure to use the appropriate flags to ensure the correct number of threads is allocated to your job. For example, if you are using SLURM you can add:

--ntasks=1 --cpus-per-task=16

to your job script to get 16 threads allocated to a single process which enables multithreading.

babakrav commented 4 years ago

We use SLURM on our cluster. So I modified the batch file as you advised:

#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=4     

But still Neper runs with single core:

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 3.5.1
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt|libscotch (full)
Info   : Running on 1 threads.
Info   : <http://neper.info>
Info   : Copyright (C) 2003-2019, and GNU GPL'd, by Romain Quey.
Info   : Ignoring initialization file.
Info   : ---------------------------------------------------------------
Info   : MODULE  -T loaded with arguments:
Info   : [ini file] (none)
Info   : [com line] -n 1000 -domain sphere(1,1000) -o gene_gene_3
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   : Creating domain...
Info   : Creating tessellation...
Info   :   - Setting seeds... 
Info   :   - Generating crystal orientations...
Info   :   - Running tessellation...
Info   : Writing results...
Info   :     [o] Writing file `gene_gene_3.tess'...
Info   :     [o] Wrote file `gene_gene_3.tess'.
Info   : Elapsed time: 4.708 secs.
========================================================================
jcappola commented 4 years ago
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=4     

is allocating one process of Neper and limiting the script to allow 4 possible processes of Neper per each compute node. Use --cpus-per-task instead of --ntasks-per-node and try it again.

babakrav commented 4 years ago

It still uses one CPU. I have even increased the --ntasks=4 but nothing changed.

#SBATCH --ntasks=4              
#SBATCH --cpus-per-task=4   
========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 3.5.1
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt|libscotch (full)
Info   : Running on 1 threads.
Info   : <http://neper.info>
Info   : Copyright (C) 2003-2019, and GNU GPL'd, by Romain Quey.
Info   : Ignoring initialization file.
Info   : ---------------------------------------------------------------

I also receive some messages regarding the modules versions. I put them here just for reference.

The following have been reloaded with a version change:
  1) GCCcore/7.3.0 => GCCcore/8.2.0
  2) binutils/2.30-GCCcore-7.3.0 => binutils/2.31.1-GCCcore-8.2.0
  3) icc/2018.3.222-GCC-7.3.0-2.30 => icc/2019.1.144-GCC-8.2.0-2.31.1
  4) iccifort/2018.3.222-GCC-7.3.0-2.30 => iccifort/2019.1.144-GCC-8.2.0-2.31.1
  5) ifort/2018.3.222-GCC-7.3.0-2.30 => ifort/2019.1.144-GCC-8.2.0-2.31.1
  6) iimpi/2018b => iimpi/2019a
  7) impi/2018.3.222-iccifort-2018.3.222-GCC-7.3.0-2.30 => impi/2018.4.274-iccifort-2019.1.144-GCC-8.2.0-2.31.1
  8) zlib/1.2.11-GCCcore-7.3.0 => zlib/1.2.11-GCCcore-8.2.0

The following have been reloaded with a version change:
  1) Boost/1.67.0-intel-2018b => Boost/1.70.0-iimpi-2019a
  2) bzip2/1.0.6-GCCcore-7.3.0 => bzip2/1.0.6-GCCcore-8.2.0
  3) imkl/2018.3.222-iimpi-2018b => imkl/2019.1.144-iimpi-2019a
  4) intel/2018b => intel/2019a

The following have been reloaded with a version change:
  1) Boost/1.70.0-iimpi-2019a => Boost/1.67.0-intel-2018b
  2) GCCcore/8.2.0 => GCCcore/7.3.0
  3) binutils/2.31.1-GCCcore-8.2.0 => binutils/2.30-GCCcore-7.3.0
  4) bzip2/1.0.6-GCCcore-8.2.0 => bzip2/1.0.6-GCCcore-7.3.0
  5) icc/2019.1.144-GCC-8.2.0-2.31.1 => icc/2018.3.222-GCC-7.3.0-2.30
  6) iccifort/2019.1.144-GCC-8.2.0-2.31.1 => iccifort/2018.3.222-GCC-7.3.0-2.30
  7) ifort/2019.1.144-GCC-8.2.0-2.31.1 => ifort/2018.3.222-GCC-7.3.0-2.30
  8) iimpi/2019a => iimpi/2018b
  9) imkl/2019.1.144-iimpi-2019a => imkl/2018.3.222-iimpi-2018b
 10) impi/2018.4.274-iccifort-2019.1.144-GCC-8.2.0-2.31.1 => impi/2018.3.222-iccifort-2018.3.222-GCC-7.3.0-2.30
 11) intel/2019a => intel/2018b
 12) zlib/1.2.11-GCCcore-8.2.0 => zlib/1.2.11-GCCcore-7.3.0

I assume each block is related to one of these modules that I have to call in the Batch file:

module load Neper
module load gmsh
module load POV-Ray
jcappola commented 4 years ago

@babakrav: I just built Neper on my cluster that also uses SLURM and was able to get multithreading working just fine with the flags I provided. Here is the SLURM script I used to execute Neper from the src/build directory of my install (ignore --qos and -p):

#!/bin/bash
#SBATCH -J neper
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --qos main
#SBATCH -p main

srun ./neper

This provided (as expected):

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 3.5.3-31
Info   : Built with: gsl|muparser|opengjk|openmp
Info   : Running on 16 threads.
Info   : <http://neper.info>
Info   : Copyright (C) 2003-2020, and GNU GPL'd, by Romain Quey.
========================================================================

My recommendation is either to get into contact with your system administrator to see what is going on with their prebuilt Neper or go ahead and clone/compile it yourself in your local user directory. You can then call it from a SLURM script using the full path to the local executable in the neper/src/build directory or add that to your ~/.bashrc path via an export.

rquey commented 4 years ago

As @jcappola said, you have to set the OMP_NUM_THREADS environment variable - this is usually not necessary (as soon as it is compiled with OpenMP, Neper uses all threads by default), but it can be used to change the number of threads and it seems to be needed for you (this is a system issue).

It is the same on a cluster as on a local workstation, and I do not think that job scheduler commands are sufficient to set multithreading: they allocate room on the cluster / initialize variables that MPI can ultimately use, but not OpenMP (as far as I know). So, try setting OMP_NUM_THREADS in your script. You can do it "by hand", e.g. using export OMP_NUM_THREADS=16, but better is to do it from the corresponding SLURM variable: export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE, so that Neper will automatically run on as many threads as are allocated. (Neper is multithreaded but not parallelized, so run it only on a single node.)

babakrav commented 4 years ago

You guys are great! I could finally manage to run it with 16 threads in the shell mode at least. Here is what I did for the reference of other users that may face the same issue:

Load the Neper module: module load Neper Configure the OMP_NUM_THREADS: export OMP_NUM_THREADS=16 Run the Neper command: neper -T -n 1000 -domain "sphere(1,1000)" -o gene_gene_3

and you'll get:

========================    N   e   p   e   r    =======================
Info   : A software package for polycrystal generation and meshing.
Info   : Version 3.5.1
Info   : Built with: gsl|muparser|opengjk|openmp|nlopt|libscotch (full)
Info   : Running on 16 threads.
Info   : <http://neper.info>
Info   : Copyright (C) 2003-2019, and GNU GPL'd, by Romain Quey.
Info   : Ignoring initialization file.
Info   : ---------------------------------------------------------------
Info   : MODULE  -T loaded with arguments:
Info   : [ini file] (none)
Info   : [com line] -n 1000 -domain sphere(1,1000) -o gene_gene_3
Info   : ---------------------------------------------------------------
Info   : Reading input data...
Info   : Creating domain...
Info   : Creating tessellation...
Info   :   - Setting seeds...
Info   :   - Generating crystal orientations...
Info   :   - Running tessellation...
Info   : Writing results...
Info   :     [o] Writing file `gene_gene_3.tess'...
Info   :     [o] Wrote file `gene_gene_3.tess'.
Info   : Elapsed time: 4.263 secs.
========================================================================
sajutabraham commented 3 years ago

For me, "export OMP_NUM_THREADS=8" is working for -T module but not for -M module. Meshing happening only on a single thread. What would be the reason?

rquey commented 3 years ago

This should not be the case. Can you give an example of Neper commands and your terminal output?