trixi-framework / Trixi.jl

Trixi.jl: Adaptive high-order numerical simulations of conservation laws in Julia
https://trixi-framework.github.io/Trixi.jl
MIT License
541 stars 110 forks source link

Automatic @threaded activation when initializing multi-threaded Julia #2159

Open afilogo opened 2 weeks ago

afilogo commented 2 weeks ago

Hello Trixi team,

As far as I understand, initializing Julia with more than 1 thread automatically uses them inside Trixi loops (correct me If I am wrong). However, I see benefit in having a keyword stored in cache to activate it, similar to the one of OrdinaryDiffEq (thread=OrdinaryDiffEq.True()), when, for instance, preferring to use threads somewhere else.

Also, when experimenting with examples involving threads, I consistently encounter allocations (which are expected) along with a degradation in performance, e.g. #1596.

Should there be interest in having this feature, I could give it a try. Or you could also suggest a simple fix. Thank you.

DanielDoehring commented 2 weeks ago

I agree with you that it is somewhat misleading that if you execute an elixir multithreaded the Trixi internals are thread parallelized, but the time integration is not. Note, however, that there are some elixirs which use the multithreaded version

https://github.com/trixi-framework/Trixi.jl/blob/c9f07078972b82e92ca4c0c9569cbcd4b70cdb05/examples/p4est_2d_dgsem/elixir_navierstokes_NACA0012airfoil_mach08.jl#L166

and that this behaviour is documented in the docs:

https://trixi-framework.github.io/Trixi.jl/stable/time_integration/#time-integration

and

https://trixi-framework.github.io/Trixi.jl/stable/parallelization/#Shared-memory-parallelization-with-threads

What do you mean with performance degredation ? It is well-known that you pretty much never get ideal speed up as synchronizing of threads causes some overhead.

afilogo commented 2 weeks ago

Thank you for your detailed response!

With degradation in performance I mean the elixir runs slower multi-threaded when compared with a single-thread version, just as you showed in the issue I mentioned. At least, that is what I take from the summary_callback().

I would like to be able to run multiple equations at the same time (one from a semidiscretization from Trixi) and, ideally, have similar performances as running each one independently in a single-thread, by using multiple threads (in this regard, I must say I have just been looking at this recently).

DanielDoehring commented 2 weeks ago

So if you crank up your problem size, say, having at least a couple thousand unknowns per thread, you should see performance improvements. In the issue I deliberately used a toy problem to keep things simple.

afilogo commented 2 weeks ago

I have not observed that in my experiments. But, the main issue for me is that it seems I cannot avoid threaded loops when starting Julia with multiple threads, even though I do not want that for the Trixi solve(), e.g. use threads in another piece of code which benefits more. Just wanted to know if this behavior is intended and/or can be easily fixed, if you find it reasonable.

I appreciate your help so far.

ranocha commented 2 weeks ago

Did you try Trixi.set_polyester!(false)? See https://trixi-framework.github.io/Trixi.jl/stable/reference-trixi/#Trixi.set_polyester!-Tuple{Bool} Switching to base threads should enable nested threading capabilities of Julia.

Alternatively, we (you) could consider making a PR with a similar function disabling threading in Trixi.jl.

DanielDoehring commented 2 weeks ago

I have not observed that in my experiments.

So that might be if the runtime is not dominated by rhs! but callbacks such as analysis or AMR. When turning off save_solution, analysis_callback for the

https://github.com/trixi-framework/Trixi.jl/blob/main/examples/tree_2d_dgsem/elixir_euler_sedov_blast_wave.jl

I get 33 seconds for one thread and 22 seconds for 2 threads. When using a uniform mesh without AMR the one-threaded run takes 61 seconds while the run on two threads takes 32 seconds, which is quite close to ideal speedup.