Closed semi-h closed 6 months ago
Could we just use the dot_product
intrinsic? NVIDIA claim to accelerate all these
We'll probably have padding in our data structure when the grid size not a multiple of SZ and then if we use a library to calculate the dot product it'll be hard to eliminate the padded entries.
We need a scalar product operator for calculating the enstrophy in the domain. The CUDA kernel can be implemented in a more performant way, there are lots of examples out there, but the current implementation is not bad. We only need to run this once in maybe 100/1000 iterations for post-processing so not worried too much.