Split the sigmakin kernel into smaller kernels

valassi commented 2 years ago

This is just a placeholder to discuss the idea of implementing smaller kernels.

A lot of pointers already exist related to this:

Stefan's WIP PR #242
issue #12 about cuda graphs
issue #11 about cuda streams
issue #175 about the internal APIs

One of the main points towards using smaller kernels is the need to allow each ixx/oxxx and each ffv function to handle pointers to large buffers for many events and to do the indexing themselves. This is discussed in https://github.com/madgraph5/madgraph4gpu/issues/175#issuecomment-988215980 for instance. Presently instead only the ixx/oxx functions are able to find an event in the input array, but then their output (and all inputs/outputs of the ffv functions) refer for CUDA to a single event. This is the first thing that must be changed to allow smaller kernels.

valassi commented 2 years ago

Note a basic prototype idea in #313: it would be enough to add for instance bufferAccessWavefunction( fptype* buffer, int iw6 ) wherethe iw6 index is passed in the same way as the ip4 now.

valassi commented 2 years ago

Note that PR #328 is the first step for this and contains several comments related to this

valassi commented 2 years ago

The color algebra optimisation has a separate issue #155. Improving the timing measurements is in #372.

valassi commented 2 years ago

One of the main issues in splitting kernels is the clarification of the relative roles of MEK and CPPProcess: who holds intermediate data buffers? who orchestrates the order of kernels? who is allowed to have process specific stuff? I am discussing this largely in #356, specifically in the econtxt of running alphas issue #373 and draft PR #434

madgraph5 / madgraph4gpu

Split the sigmakin kernel into smaller kernels #310