madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 32 forks source link

passing 3 or 4 momenta #58

Open oliviermattelaer opened 3 years ago

oliviermattelaer commented 3 years ago

Just want to create an issue to track some development that we did in the past.

At some pooint during the hackaton, we tried to pass 3 vector instead of 4 vector (in additionn of the mass value). The up side is clear (reduction of 25% of the memory footprint), the downside being that we need to recompute the energy compotnent obviously --and a square root is heavy in term of register/instruction--)

Looks like the version of andrea move back to the 4 vector transfer. We might want to revisit that idea and check again the two methods

valassi commented 3 years ago

Hi Olivier, thanks for opening this, it will be useful to collect some studies.

I do not remember actively changing from 3 to 4 momenta myself, but maybe I am wrong, I think that the present code is aligned to the original MadGraph.

In any case I agree that there is a compromise to be done. Probably passing 3-momenta is heavier in recomputations and registers in the GPU. Passing 4-momenta on the other hand is heavier on copies: at the moment, and for eemumu, this is especially heavy on the copy of rambo outputs to the CPU. As discussed elsewhere (eg #22) the relative importance of this will decrease when we go to more complex processes, and it will also decrease when we do a realistic event unweighting on the GPU, so that we do not copy all events to the CPU, but only those which passed hit-or-miss criteria.

oliviermattelaer commented 3 years ago

It's likely that you branch out before some of my latest change. But for this one it is indeed not that relevant (and indeed the importance will further decrease in the future).

On the other-hand, I have kept the "low memory mode" for the color-matrix computation --even if in term of performance the result were not that great--

oliviermattelaer commented 3 years ago

So by looking at the code

1) we have an issue with the ixxxxx routine that we need to fix (inconsistent handling of 4/3 momenta --at least in epoch2-- 2) The code sometimes read the full 4-momenta and sometimes only part of it (three momenta or even less)

valassi commented 3 years ago

Hi @oliviermattelaer I am still reviewing old tickets.

I am not sure if this one is still relevant. I assume this is about the XXX routines rather than the FFV routines, right? Please note that a few months ago I went through all xxx routines, I cross checked that the simpler versions (imz/ipz/ixz) agree with the full versions (ixx), I added some tests with a reference file, and I also tried to check that the c++ versions agree with the fortran.

A couple of more specific comments:

Finally, this issue #58 that you opened may well be a duplicate of my question #200 ? The latter is about the question I asked above, the ixxxxx implementation in https://github.com/madgraph5/madgraph4gpu/blob/e41c14202e631bb87e1f514d7d116252dfc1dac4/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/aloha/template_files/gpu/helas.cu#L73-L74

I keep this open tioll clarified (I guess I could close #200 as duplicate, but I keep that open as well)

oliviermattelaer commented 3 years ago

This is something that it would be great to revisit later (and therefore keep this open). The question is should we pass E,px,py,pz,m as input or just px,py,pz,m going for the second reduce the amount memory to transfer from cpu to gpu but this increase the amount of work that the GPU has to do.