Closed majosm closed 1 year ago
- Is there a way I can determine whether this lack of tags was the root cause of the slow compilation, or if adding them is just papering over the real issue?
Maybe @kaushikcfd can comment. Until we figure out how it was spending its time, it's hard to be certain.
2. Is it OK to use
DiscretizationDOFAxisTag
for both the nodal and modal axes of the Vandermonde matrix? Or should they be distinct?
Let's define it to be OK. If you make a PR, update the docs for that tag to say that it applies to both modal and nodal DOFs.
Btw, nice job discovering this!
To add: I would be OK with applying the tags even before we convincingly resolve the reason for the compilation slowness.
Is it OK to use DiscretizationDOFAxisTag for both the nodal and modal axes of the Vandermonde matrix? Or should they be distinct?
Yes, that should be OK.
Is there a way I can determine whether this lack of tags was the root cause of the slow compilation, or if adding them is just papering over the real issue?
It is indeed the lack of tags that causes slow compilation/runtime issues. The reason is that we fallback to no fusion i.e. the loopy kernel being lowered has many loop nests which takes a long time to generate code (see https://github.com/inducer/loopy/pull/372). The runtime suffers because no fusion $\Rightarrow$ no array contraction $\Rightarrow$ higher DRAM traffic/poor temporal locality in the OpenCL kernels.
When running the Y3 prediction case with filtering enabled, a warning is emitted:
By adding some extra
DiscretizationDOFAxisTag
s to the Vandermonde matrices inmeshmode/discretization/connection/modal.py
, I can make the warning go away. Doing so also fixes the compilation speed issue with filtering; specifically, the scheduling and codegen phases in loopy no longer exhibit the ~10x slowdown that they otherwise would with filtering enabled.I have a couple of questions about this:
DiscretizationDOFAxisTag
for both the nodal and modal axes of the Vandermonde matrix? Or should they be distinct?