Open valassi opened 2 years ago
Note a basic prototype idea in #313: it would be enough to add for instance bufferAccessWavefunction( fptype* buffer, int iw6 ) wherethe iw6 index is passed in the same way as the ip4 now.
Note that PR #328 is the first step for this and contains several comments related to this
The color algebra optimisation has a separate issue #155. Improving the timing measurements is in #372.
One of the main issues in splitting kernels is the clarification of the relative roles of MEK and CPPProcess: who holds intermediate data buffers? who orchestrates the order of kernels? who is allowed to have process specific stuff? I am discussing this largely in #356, specifically in the econtxt of running alphas issue #373 and draft PR #434
This is just a placeholder to discuss the idea of implementing smaller kernels.
A lot of pointers already exist related to this:
One of the main points towards using smaller kernels is the need to allow each ixx/oxxx and each ffv function to handle pointers to large buffers for many events and to do the indexing themselves. This is discussed in https://github.com/madgraph5/madgraph4gpu/issues/175#issuecomment-988215980 for instance. Presently instead only the ixx/oxx functions are able to find an event in the input array, but then their output (and all inputs/outputs of the ffv functions) refer for CUDA to a single event. This is the first thing that must be changed to allow smaller kernels.