Open ranocha opened 3 years ago
Are these times with (pre-)compilation? Because if I run examples\2d\elixir_advection_amr.jl
on my laptop, it is finished in <5 s.
These are the times reported by the summary_callback
after each test.
Related to #62
We might also want to discuss the following questions/options.
save_restart
from many elixirs?Do we need to run all 2D tests on Windows and Mac OS? Would it suffice to run only the MPI and threaded tests?
I think we said yes. Sometimes, in the past there have been weird macOS-related issues, and I think we should make sure that we exercise most of the core functionality of Trixi on all relevant platforms. At least as long as it does not become unbearable... If we want to save time during development, what about disabling the macOS and Windows tests on Draft PRs? This way one could have faster turnround times during most of a PR's lifetime, and only get the full checks once we are ready to merge.
Can we remove the restart callback save_restart from many elixirs?
Yes, I have no issue with this. IMHO, we can at least remove that from all but one elixir per dimension-mesh-solver-equation combination.
Split some expensive test sets into more CI jobs
Absolutely. In the past, I have suggested this before, but you (rightfully) warned that due to startup latency, this does not always make it faster.
Split some expensive test sets into more CI jobs
Absolutely. In the past, I have suggested this before, but you (rightfully) warned that due to startup latency, this does not always make it faster.
Yeah, but we have a bunch of new equation and mesh types so that we can benefit less from re-using compiled code.
Another (minor) aspect: Documenter is set up to fail when doctests fail, so we don't need to run doctests in https://github.com/trixi-framework/Trixi.jl/blob/ff549a5e67a7685f2ad3c97a0694c756160d79b4/test/test_unit.jl#L515-L517
The tests really take way too long to run IMO. It's so far been 15 minutes and my tests are still running.
It would be nice to have a minimal set of tests which preserve "enough" code coverage so that any changes to Trixi base could be more quickly checked on a local machine.
One possibility would be to create a testset intended for "local" testing, which could exclude some of the CI tests.
That's definitely a good point. What I usually do when modifying Trixi is to include only a subset of tests locally, say test/test_examples_2d_advection.jl
when I modified some 2D stuff. That's usually a good smoke test. However, it's a bit hard to cover (nearly) everything in a cheap test set using the current way of testing, I fear.
Has anything significantly changed the timing since you reported them, @ranocha? I noticed that 3d/elixir_euler_amr.jl
takes over 300s now in GitHub (and for some reason over 400s on my system, maybe that's because of Windows?). 2D and 3D tests regularly take over an hour now.
Is it really necessary to let the simulation run that long? Would it be sufficient to let tests like this run to t=1
instead of t=10
(maybe use a different start time to still test that the blob is running over the periodic boundaries)?
Yeas, something like that is definitely a good option from my point of view. A major impact on the CI run time was our more extensive use of Polyester, StrideArrays, and LoopVectorization. This combination is really good for runtime performance, but particularly demanding for CI when collecting coverage results.
Finding good tests is always tricky. "As short as possible but as long as necessary" is our yardstick, but what exactly the latter part means in practice is often hard to tell.
I fully agree that we need to reduce the amount of time it takes for testing, but from past experience (especially from tests that didn't run long enough to uncover errors that were only found much later), I feel like this is in general a non-trivial task and requires some thinking and selective tweaking. This is also the only reason this hasn't been tackled yet - a lack of developer time :-/
We are experiencing some problems with GitHub actions in the last few days - jobs are stuck at the queued stage although we have enough free capacity. One way to reduce problems like these could be to reduce the number of tests that need to run on all three OS. From my point of view, it should be sufficient to have some basic tests on all OS (including threads, MPI, p4est, and other binary dependencies), but we definitely do not need to test every 2D setup on Windows and Mac OS.
We are experiencing some problems with GitHub actions in the last few days - jobs are stuck at the queued stage although we have enough free capacity. One way to reduce problems like these could be to reduce the number of tests that need to run on all three OS. From my point of view, it should be sufficient to have some basic tests on all OS (including threads, MPI, p4est, and other binary dependencies), but we definitely do not need to test every 2D setup on Windows and Mac OS.
IIRC, this hs been resolved by your efforts this year, hasn't it @ranocha?
This particular problem, yes. However, I think CI is still too expensive
For example, here is a list of examples that are relatively expensive on Windows (2D)
examples\2d\elixir_advection_amr.jl
, 93.2sexamples\2d\elixir_advection_amr_nonperiodic.jl
, 47.3sexamples\2d\elixir_hypdiff_nonperiodic.jl
, 38.9sexamples\2d\elixir_euler_shockcapturing.jl
, 46.5sexamples\2d\elixir_euler_blast_wave_amr.jl
, 62.8sexamples\2d\elixir_euler_sedov_blast_wave.jl
, 61.1sexamples\2d\elixir_euler_positivity.jl
, 57.6sexamples\2d\elixir_mhd_alfven_wave.jl
, 46.6sexamples\2d\elixir_mhd_alfven_wave_mortar.jl
, 74.7sexamples\2d\elixir_mhd_orszag_tang.jl
, 45.8sexamples\2d\elixir_lbm_lid_driven_cavity.jl
, 40.6sexamples\2d\elixir_mhd_rotor.jl
, 192sexamples\2d\elixir_mhd_blast_wave.jl
, 158sand Ubuntu (3D)
examples/3d/elixir_advection_mortar.jl
, 40.7sexamples/3d/elixir_hypdiff_nonperiodic.jl
, 51.9sexamples/3d/elixir_euler_amr.jl
, 111sexamples/3d/elixir_euler_shockcapturing.jl
, 67.1sexamples/3d/elixir_euler_sedov_blast_wave.jl
, 72.0s (although it's using only 5 time steps)examples/3d/elixir_eulergravity_eoc_test.jl
, 44.2s (although it's using only 9 time steps)