ludwig-cf / ludwig

A lattice Boltzmann code for complex fluids
https://ludwig.epcc.ed.ac.uk
Other
55 stars 36 forks source link

Move mpi test to a GitHub action #304

Closed kevinstratford closed 4 months ago

kevinstratford commented 4 months ago

Note on regression tests.

This is the first attempt to unify the serial and parallel test cases. The aim is to run the same input files in either serial and parallel and not care about details.

I've removed decomposition, cell list dimensions and so on which do not match the serial case in the test-diff.sh script. However, this may not catch all decomposition-dependent output.

Unit tests

1 x mpi x 2 x threads 2 x mpi x 2 x threads 4 x mpi x 2 x threads

All ok.

Regression tests

d2q9

Good at 2 x mpi x 2 x threads

d3q15

Good at 2 x mpi x 2 x threads

d3q19-short

Gross failure:

serial-open-ru2.inp    decomposition interacts with open bc
serial-poly-st1.inp    util/multi_poly_init not parallel
serial-sqmr-st2.inp    needs investigation
serial-tern-st1.inp    all these need investigation...
serial-tern-st2.inp    ... including initialisations
serial-tern-st3.inp
serial-tern-st4.inp
serial-tern-st5.inp

Marginal failures (will pass at e.g. TOLERANCE=1.0e-08)

serial-anch-cn2.inp
serial-chol-n01.inp
serial-chol-n02.inp
serial-chol-n03.inp
serial-chol-n04.inp
serial-chol-n04.inp
serial-chol-w01.inp
serial-chol-w02.inp
serial-chol-w03.inp

Otherwise ok.

d3q27

Good at 2 x mpi x 2 x threads

kevinstratford commented 4 months ago

The mpirun -np 2 github action seems unreasonably slow compared with the serial version (4-5 minutes cf 40-50 seconds).

Modest investigation suggests the increase in time is across-the-board, althouhg halo exchanges are particularly bad.

kevinstratford commented 4 months ago

I've removed a number of unused tests from

./tests/regression/d3q19

which are mainly longer versions of active tests. These tests all still pass at the point of removal.

kevinstratford commented 4 months ago

This is updated at #312

I have left mpi tests for d2q9 and d3q19 only to keep the time under control.