microhh / rte-rrtmgp-cpp

C++ / CUDA implementation of RTE+RRTMGP radiative transfer solver
BSD 3-Clause "New" or "Revised" License
3 stars 19 forks source link

Tuning `sw_source_adding_kernel` #12

Open julietbravo opened 3 years ago

julietbravo commented 3 years ago

I'm starting with this kernel..

julietbravo commented 3 years ago

I created a simple kernel tuner script (sw_source_adding_kernel.py). The advantage of this kernel is that it does not require realistic input, so np.zeros() or np.random.random() as input is fine. The only disadvantage is that you can't do correctness checking with the kernel tuner.

I also tested/tuned the kernel for a varying number of nlay (which is something which will frequently vary in e.g. our LES runs), in this case nlay={64, 128, 192, 256}, the fastest configurations are always with block_size_x=32, and is not sensitive to the choice of block_size_y.

The old configuration was already quite optimal, the best configuration from the kernel tuner is only 8% faster.