Add a fused distributed kernel for the transport equation

xcompact3d / x3d2

BSD 3-Clause "New" or "Revised" License

3 stars 4 forks source link

In order to save some bandwidth on GPUs we fuse some of the operations in the transport equation.

This fused kernel is capable of evaluating $${RHS}_x^u \leftarrow -\frac{1}{2} \bigg(u\frac{\partial u}{\partial x} + \frac{\partial u u}{\partial x}\bigg) + \nu \frac{\partial u^2}{\partial x}$$ or $${RHS}_z^v \leftarrow -\frac{1}{2} \bigg(w\frac{\partial v}{\partial z} + \frac{\partial v w}{\partial z}\bigg) + \nu \frac{\partial v^2}{\partial z}$$ and similar groups of terms in the transport equation depending on the inputs. In total this fused kernel is executed 3 times per direction, and 9 times in total per timestep to evaluate all the terms in the transport equation.

$${RHS}^u = {RHS}_x^u + {RHS}_y^u + {RHS}_z^u$$

$${RHS}^v = {RHS}_x^v + {RHS}_y^v + {RHS}_z^v$$

$${RHS}^w = {RHS}_x^w + {RHS}_y^w + {RHS}_z^w$$

xcompact3d / x3d2

Add a fused distributed kernel for the transport equation #15