illinois-ceesd / mirgecom

MIRGE-Com is the workhorse simulation application for the Center for Exascale-Enabled Scramjet Design at the University of Illinois.
Other
12 stars 19 forks source link

Error with batched einsum array context #967

Open majosm opened 1 year ago

majosm commented 1 year ago

When using the batched einsum array context with the prediction driver, the RHS compile produces the following error:

loopy.diagnostic.LoopyIndexError: 'inv_metric_deriv_v_wall[iambient_dim, itopo_dim, iel_47_inner_inner + iel_47_inner_outer*4 + iel_47_outer*1280, 0]' in instruction '_pt_temp_2_store_itopo_dim_idof_74_update' accesses out-of-bounds array element (could not establish '{ [i0, i1, i2, 0] : 0 <= i0 <= 1 and 0 <= i1 <= 1 and 0 <= i2 <= 12390 }' is a subset of '{ [i0, i1, i2, i3] : i3 = 0 and 0 <= i0 <= 1 and 0 <= i1 <= 1 and 0 <= i2 <= 581 }').

When run without -O, it produces a different (but possibly related) error:

loopy.diagnostic.LoopyError: inames _pt_temp_335_dim0 and iel_47 do not iterate over the same domain

The error persists with most of the physics turned off, as long as species limiting and the main isothermal boundary both remain enabled (note: the actual boundary condition being applied doesn't seem to matter, I've tried both isothermal and DummyBoundary).

A reduced Y3 case can be installed/run with the instructions below. It creates a RHS DAG of about 100 nodes and runs in a few minutes.

git clone git@github.com:illinois-ceesd/drivers_y3-prediction.git
cd drivers_y3-prediction
git checkout batched-einsum-error-reproducer
./buildMirge.sh --use-ssh
source emirge/config/activate_env.sh
cd smoke_test_ks
python -m mpi4py driver.py -i run_params.yaml --lazy --log
majosm commented 1 year ago

Forgot to mention: when I look at the two loops mentioned in the error, their lengths correspond to the numbers of elements in the interior faces and on the isothermal boundary.

majosm commented 1 year ago

I applied @kaushikcfd's fix from inducer/arraycontext#217 and now I can run the full KS 2D case without errors. 🎉

Seeing that PR made me question whether I'm using the right version of the code, though. In my current subpackage config, apply_kennedy_fusion_with_batched_einsum_extension is in meshmode. Is there a different config I should be using that uses the version that's in arraycontext?

inducer commented 1 year ago

Yay! How are compile times with this transform path?

inducer commented 1 year ago

Seeing that PR made me question whether I'm using the right version of the code, though. In my current subpackage config, apply_kennedy_fusion_with_batched_einsum_extension is in meshmode. Is there a different config I should be using that uses the version that's in arraycontext?

Using what's in Kaushik's meshmode branch is the correct approach. As Kaushik says here, he has just "parked" the code in meshmode while it's under review. The code is technically indepenent of meshmode and thus will land in arraycontext, that's why the PRs live there.

majosm commented 1 year ago

Yay! How are compile times with this transform path?

Let's say there's room for improvement, heh.

Fusion contractor:

    Run time :                                   894 sec.

Batched einsum:

    Run time :                                   3851 sec.

Unfortunately it looks like the timestep time is also quite a bit slower at the moment.

Fusion contractor:

 Performance:
    walltime: 0.264042 s
    visualization time:      0 s
    garbage collection time:      0 s
    log walltime: 4.29102e-05 s
 Memory:
    python memory: 2580.69 Mb
    gpu memory: 1481.69 Mb
    memory hwm: 2751.62 Mb
    mempool total: 971.216 Mb
    mempool active: 147.447 Mb

Batched einsum:

Performance:
   walltime: 1.5777 s
   visualization time:      0 s
   garbage collection time:      0 s
   log walltime: 4.19766e-05 s
Memory:
   python memory: 1733.69 Mb
   gpu memory: 1073.62 Mb
   memory hwm: 2947.81 Mb
   mempool total: 676.924 Mb
   mempool active:  194.55 Mb