inducer / meshmode

High-order unstructured mesh representation and discrete function spaces
https://documen.tician.de/meshmode/
25 stars 24 forks source link

Axis tagging in nodal-to-modal and modal-to-nodal connections #372

Closed majosm closed 1 year ago

majosm commented 1 year ago

When running the Y3 prediction case with filtering enabled, a warning is emitted:

pytato/target/loopy/__init__.py:151: UserWarning: [my_rhs]: Falling back to a slower transformation strategy as some loops are uninferred which mesh entity they belong to.

By adding some extra DiscretizationDOFAxisTags to the Vandermonde matrices in meshmode/discretization/connection/modal.py, I can make the warning go away. Doing so also fixes the compilation speed issue with filtering; specifically, the scheduling and codegen phases in loopy no longer exhibit the ~10x slowdown that they otherwise would with filtering enabled.

I have a couple of questions about this:

  1. Is there a way I can determine whether this lack of tags was the root cause of the slow compilation, or if adding them is just papering over the real issue?
  2. Is it OK to use DiscretizationDOFAxisTag for both the nodal and modal axes of the Vandermonde matrix? Or should they be distinct?
inducer commented 1 year ago
  1. Is there a way I can determine whether this lack of tags was the root cause of the slow compilation, or if adding them is just papering over the real issue?

Maybe @kaushikcfd can comment. Until we figure out how it was spending its time, it's hard to be certain.

2. Is it OK to use DiscretizationDOFAxisTag for both the nodal and modal axes of the Vandermonde matrix? Or should they be distinct?

Let's define it to be OK. If you make a PR, update the docs for that tag to say that it applies to both modal and nodal DOFs.

inducer commented 1 year ago

Btw, nice job discovering this!

inducer commented 1 year ago

To add: I would be OK with applying the tags even before we convincingly resolve the reason for the compilation slowness.

kaushikcfd commented 1 year ago

Is it OK to use DiscretizationDOFAxisTag for both the nodal and modal axes of the Vandermonde matrix? Or should they be distinct?

Yes, that should be OK.

Is there a way I can determine whether this lack of tags was the root cause of the slow compilation, or if adding them is just papering over the real issue?

It is indeed the lack of tags that causes slow compilation/runtime issues. The reason is that we fallback to no fusion i.e. the loopy kernel being lowered has many loop nests which takes a long time to generate code (see https://github.com/inducer/loopy/pull/372). The runtime suffers because no fusion $\Rightarrow$ no array contraction $\Rightarrow$ higher DRAM traffic/poor temporal locality in the OpenCL kernels.