halbux / sparselizard

C++ FEM library | user-friendly | multi-physics | hp-adaptive | HPC
http://www.sparselizard.org
Other
332 stars 62 forks source link

Question - setting to increase the number of cores/threads used. #36

Closed caseyjamesdavis closed 3 years ago

caseyjamesdavis commented 3 years ago

Alex, thanks for creating, testing and documenting this awesome FEM library.

I am currently using it via the static library method.

All the examples I have tried work great, but they only use 9 threads even if I make the mesh very large.

Is there a place to manually set the number of cores/threads?

Here are my machine details:

image image

Thanks for your help.

Cheers, Casey

halbux commented 3 years ago

Hi Casey,

It should definitely be able to use more if it feels it s gonna improve the speed. I got all cores 100% used on a 16 cores core I9 and threadripper 32 cores. The code is written in a way that I do not need to manage threads myself, they are created and managed during dgemm BLAS calls, MUMPS direct solve and parallel sort (on Linux) so openBLAS seems to be the culprit. Could it be because the static library is compiled on my 4cores laptop, for which I guess much more than 8 threads does not make sense?

In any case you can compile sparselizard yourself to solve the problem or try to set the max number of threads with something like:

setenv OMP_NUM_THREADS 4,3,2 (see https://www.openmp.org/spec-html/5.0/openmpse50.html ).

Also:

You might be interested in the current developments, possibly already available in less than 3 weeks, on domain decomposition method for MPI parallelism. The target is > 1 billion dofs fast on supercomputers but it might scale better for your large number of cores than multithreading only (and you can connect multiple computers to work together if the data link between them is fast enough). I will probably also have 6 connected machines each with 64core threadripper to test it. Stay tuned!

Alex

caseyjamesdavis commented 3 years ago

Hi Alex,

Thanks for the explanation. Compiling on my machine seems like the best solution.

I have tried compiling a few times, and I think it works, but then I can't seem to get a simulation to execute. I'm sure the issue is somewhere between the chair and keyboard ;-)

This is what I have tried:

From this point on I'm less confident about what to do.

This is the error I get:

image

Any ideas?

Thanks again for your help.

Casey

halbux commented 3 years ago

Looks good untill the less confident moment :)

It seems you got something partially running. Could you run the default main.cpp to see if it also gives this error?

After all the correct confident steps you can go to build/simulation/default and run ./default This is the executable that was created during the build call, it is the executable created from sparselizard/simulation/default/main.cpp... so copy the example you want to run to simulations/default then build in the build folder and run the executable created in build/simulation/default. It will find all mesh files since they are copied as well automatically into build/simulation/default as you will see

caseyjamesdavis commented 3 years ago

Success!

Sparselizard is now using all my cores/threads.

htop: image

paraview: image

I compiled again via:

When I moved a different example 'main.cpp' & 'disk.msh' into /sparselizard/simulations/default and then:

from the build directory it was like it had a memory and when I would run ./default it would always execute the original simulation, not the new one.

I then followed the 'add project' instructions on your readme and everything works as expected. The pictures above are the result of me adding an order of magnitude to the 2D base in the nonlinear-electro-thermal-heating-3d example.

The part I was missing was:

Now I can really play around.

Thanks so much for your help.

ps - I just noticed the scatter plot pdf and I think I understand the name of your program a bit better now :-)

halbux commented 3 years ago

That's excellent news! Congrats, you're the first user I know of to run on 128 threads. For small simulations (but it is good to check once anyways) it might slow down to use 128 threads (I have sometimes gotten slightly faster results with num threads = num cores, so 64 in your case, this depends how fast your memory is as well, maybe worth checking the runtime with a sweep on number of threads). If you do that on a rather big problem please do not hesitate to share the sweep plot ).

Alex

caseyjamesdavis commented 3 years ago

A sweep of runtime vs number of threads on a large problem sounds like fun - I'm in.

Where and how do I limit the number of threads?

Casey

caseyjamesdavis commented 3 years ago

I think I figured it out. I just added

export OMP_NUM_THREADS=64

to my .bashrc file.

caseyjamesdavis commented 3 years ago

It seems like the sweet spot on my machine is about 10 to 20 threads for 2D problems with millions of nodes.

results

halbux commented 3 years ago

Hi Casey,

It seems the number of cores is too strong compared to the memory speed so I am not so surprised by the graph. It might scale better with an interpolation order 2. Lets see in a few weeks how the new DDM algorithms will perform on a single machine. Thanks for the sweep!

Concerning the heat equation:

You will find both the strong and the weak form in the example online "Conductor heating due to DC current" (on sparselizard.org). You can also use 'predefineddiffusion' (see documentation) to have them written for you.

Alex