Open afilogo opened 2 weeks ago
I agree with you that it is somewhat misleading that if you execute an elixir multithreaded the Trixi internals are thread parallelized, but the time integration is not. Note, however, that there are some elixirs which use the multithreaded version
and that this behaviour is documented in the docs:
https://trixi-framework.github.io/Trixi.jl/stable/time_integration/#time-integration
and
What do you mean with performance degredation ? It is well-known that you pretty much never get ideal speed up as synchronizing of threads causes some overhead.
Thank you for your detailed response!
With degradation in performance I mean the elixir runs slower multi-threaded when compared with a single-thread version, just as you showed in the issue I mentioned. At least, that is what I take from the summary_callback().
I would like to be able to run multiple equations at the same time (one from a semidiscretization from Trixi) and, ideally, have similar performances as running each one independently in a single-thread, by using multiple threads (in this regard, I must say I have just been looking at this recently).
So if you crank up your problem size, say, having at least a couple thousand unknowns per thread, you should see performance improvements. In the issue I deliberately used a toy problem to keep things simple.
I have not observed that in my experiments. But, the main issue for me is that it seems I cannot avoid threaded loops when starting Julia with multiple threads, even though I do not want that for the Trixi solve(), e.g. use threads in another piece of code which benefits more. Just wanted to know if this behavior is intended and/or can be easily fixed, if you find it reasonable.
I appreciate your help so far.
Did you try Trixi.set_polyester!(false)
? See https://trixi-framework.github.io/Trixi.jl/stable/reference-trixi/#Trixi.set_polyester!-Tuple{Bool}
Switching to base threads should enable nested threading capabilities of Julia.
Alternatively, we (you) could consider making a PR with a similar function disabling threading in Trixi.jl.
I have not observed that in my experiments.
So that might be if the runtime is not dominated by rhs!
but callbacks such as analysis or AMR. When turning off save_solution, analysis_callback
for the
I get 33 seconds for one thread and 22 seconds for 2 threads. When using a uniform mesh without AMR the one-threaded run takes 61 seconds while the run on two threads takes 32 seconds, which is quite close to ideal speedup.
Hello Trixi team,
As far as I understand, initializing Julia with more than 1 thread automatically uses them inside Trixi loops (correct me If I am wrong). However, I see benefit in having a keyword stored in cache to activate it, similar to the one of OrdinaryDiffEq (thread=OrdinaryDiffEq.True()), when, for instance, preferring to use threads somewhere else.
Also, when experimenting with examples involving threads, I consistently encounter allocations (which are expected) along with a degradation in performance, e.g. #1596.
Should there be interest in having this feature, I could give it a try. Or you could also suggest a simple fix. Thank you.