madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 33 forks source link

Implement helicity recycling in our CUDA/C++ #279

Open valassi opened 2 years ago

valassi commented 2 years ago

This is a followup of issue #276.

In order to compare C++/CUDA and Fortran throughputs, we should make sure that they use the same algorithm. This is presently not the case: we are comparing a faster Fortran with helicity recycling to a slower C++ without helicity recycling.

In issue #276, I will follow up a better estimate of slower Fortran without helicity recycling, that can be directly compared to C++.

But what we really need to do is implement helicity recycling in the CUDA/C++. @oliviermattelaer is this something that would be complicated (an/or maybe is already underway)? Thanks

oliviermattelaer commented 2 years ago

I would actually doubt that doing helicity recycling for gpu is a good idea since this blows up the size of the code and the memory requirement. For vectorised cpu, that is obviously an option.

valassi commented 2 years ago

Within PR #401, note that I had to introduce this fix while moving to v311 https://github.com/madgraph5/madgraph4gpu/pull/401/commits/04d4b8e41b7386164bdfde2277f1743d7ebb9a18 This indeed picks up a new feature of v311 (#360), but it is only relevant to helicit yrecycling (#279).

In practice