Strange combustion problem scaling behavior vs. mesh

MTCam commented 2 years ago

This is a simple combustion case with CNS + Combustion, 7 species, order 1 elements, Euler time stepping and periodic boundaries. It reproduces issues we are seeing with production case scaling.

The driver for running this is here:

https://github.com/illinois-ceesd/drivers_bozzle/blob/state-handling/mixalot.py

The following results can be produced by running with different mesh sizes by setting those here: https://github.com/illinois-ceesd/drivers_bozzle/blob/78ca790930a79cbbfe443435ffc5b9f4719dff76/mixalot.py#L143

Setting {x, y, z}_scale scales the geometry s.t. the number of elements grows according to the scale factors, but the grid spacing and associated physics-consistent timestep restrictions and settings are unchanged.

Here's how the 1GPU case behaves with mesh size (Walltime per RHS vs. number of elements):

This looks a little weird at the low element count side of the curve, and we'd like to figure out why. Why does the slope change? Here's a logscale version, which looks a little more like what we are used to seeing.

Timing data for the case:

NElem	NGPUs	TPS(s)	Compile(s)
54	1	.02	687
114	1	.02	690
342	1	.02	711
486	1	.02	706
1026	1	.02	737
1350	1	.02	742
1710	1	.02	737
2850	1	.03	733
11K	1	.07	740
23K	1	.1	737
50K	1	.14	721
106K	1	.2	722
215K	1	.3	734
456K	1	.64	785
686K	1	.89	807

Originally posted by @MTCam in https://github.com/illinois-ceesd/mirgecom/issues/602#issuecomment-1031955973 cc: @inducer

inducer commented 2 years ago

Thanks for pulling together this data!

The log plot looks quite different from the linear version... are there a bunch of points on top of each other in the linear case?
The log plot seems to say evaluating a RHS with zero(-ish) elements costs 0.02s. Not great. What's in that time?
The slope change is the opposite of what I'd expect from, say, cache effects: Flatter slope until we fall out of cache, and then steepening.

cc @lukeolson @matthiasdiener

MTCam commented 2 years ago

Thanks for pulling together this data!

The log plot looks quite different from the linear version... are there a bunch of points on top of each other in the linear case?

The linear plot did not include the measurements "on the floor" (i.e. it starts at 2850 elements)

The log plot seems to say evaluating a RHS with zero(-ish) elements costs 0.02s. Not great. What's in that time?

I intended it to be RHS evals only. Any logpyle profiling overhead, if any, would be included, but all other status/health stuff is off for timing runs.

The slope change is the opposite of what I'd expect from, say, cache effects: Flatter slope until we fall out of cache, and then steepening.

cc @lukeolson @matthiasdiener

illinois-ceesd / mirgecom

Strange combustion problem scaling behavior vs. mesh #603