change of signature for cudacpp

current signature is

sigmakin(momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, IEVT)
calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, iEVT)

I do have a couple of question for you @valassi, @roiser:

why sigmakin has the " multi_channel_num, multi_channel_denom" in the signature? I guess that we can remove those (they should not be passed from the cpu/gpu and should be useless after sigmaKin.
Do you mind if I change (also in the fortran side) channelID to be the "real" channel id (not the diagram id)? This will simplify the handling of the color (and reduce memory footprint)
What do you think about the following signature:
```
sigmakin(momenta, coupling, answer, channelid,random_for_helicity [in],selected_hel[out], random_for_color [in], selected_color [out], IEVT)
calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, jamp2, iEVT)
```
- random_for_helicity and random_for_color would be a (list) of random number to do the selection
- selected_hel/selected_color would be the one selected (selection done within sigmaKin
- jamp2 would be a running sum over the helicity which correspond to the probability of a given color. (so init to zero in sigmakin)

Hi Olivier, thanks, useful questions

current signature is

sigmakin(momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, IEVT)
calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, iEVT)

I do have a couple of question for you @valassi, @roiser:

1. why sigmakin has the " multi_channel_num, multi_channel_denom" in the signature? I guess that we can remove those (they should not be passed from the cpu/gpu and should be useless after sigmaKin.

Ok this is something I must have added these days for multichannel in https://github.com/madgraph5/madgraph4gpu/pull/465

I was also a bit surprised that I had to do this, but I found this to be the most consistent to the rest of the code

the memory buffers for numerators and denominators are allocated (once, at the beginning of the application) by MatrixElementKernels.cc
then they are passed around as arguments in every call

This is consistent to what is done for other memory buffers, namely for couplings that are calculated from Gs

Buffers that must be visible externally (eg in the Bridge) are allocated even outside MatrixElementKernels, and then they are passed as references to MEK
Buffers that are "internal" (this is what you are saying esssentially) would ideally be allocated even within sigmakin for instance or within calculate wavefunction. The problem is that this would give a malloc/free at every iteration, which is not ideal. Or alternatively one has to statically allocate them as in Fortran, but then you need to know the dimensions at build time, while we have several things in cudacpp that we allocate dynamically at runtime.

I am not saying this is the optimal solution. We can certainly reconsider this. But for the moment I would keep it like that...

(Incidentally, this is very vaguely related to https://github.com/madgraph5/madgraph4gpu/issues/356).

2. Do you mind if I change (also in the fortran side) channelID to be the "real" channel id (not the diagram id)? This will simplify the handling of the color (and reduce memory footprint)

Not a problem! Just let me know when you do it, so I need to synchronise the code generation of cudacpp.

I think I actually like it better in the new way you propose!

3. What do you think about the following signature:
sigmakin(momenta, coupling, answer, channelid,random_for_helicity [in],selected_hel[out], random_for_color [in], selected_color [out], IEVT) calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, jamp2, iEVT)

random_for_helicity and random_for_color would be a (list) of random number to do the selection

selected_hel/selected_color would be the one selected (selection done within sigmaKin

jamp2 would be a running sum over the helicity which correspond to the probability of a given color. (so init to zero in sigmakin)

Two points

no, unfortunately I think I would keep also numerator and denominator, see above
also, it is not IEVT but rather NEVT, and it is only there in cpp not in cuda (this I may eventually simplify, related to https://github.com/madgraph5/madgraph4gpu/issues/415)

Currently it is https://github.com/madgraph5/madgraph4gpu/blob/c2c57cd456b3d1555f04ac7f77f78398f5a8180e/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.h#L131

#ifdef __CUDACC__ /* clang-format off */
  __global__ void
  sigmaKin( const fptype* allmomenta,       // input: momenta[nevt*npar*4]
            const fptype* allcouplings,     // input: couplings[nevt*ndcoup*2]
            fptype* allMEs                  // output: allMEs[nevt], |M|^2 final_avg_over_helicities
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
            , fptype* allNumerators         // output: multichannel numerators[nevt], running_sum_over_helicities
            , fptype* allDenominators       // output: multichannel denominators[nevt], running_sum_over_helicities
            , const unsigned int channelId  // input: multichannel channel id (1 to #diagrams); 0 to disable channel enhancement
#endif
            );
#else
  __global__ void
  sigmaKin( const fptype* allmomenta,     // input: momenta[nevt*npar*4]
            const fptype* allcouplings,   // input: couplings[nevt*ndcoup*2]
            fptype* allMEs,               // output: allMEs[nevt], |M|^2 final_avg_over_helicities
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
            fptype* allNumerators,        // output: multichannel numerators[nevt], running_sum_over_helicities
            fptype* allDenominators,      // output: multichannel denominators[nevt], running_sum_over_helicities
            const unsigned int channelId, // input: multichannel channel id (1 to #diagrams); 0 to disable channel enhancement
#endif
            const int nevt );             // input: #events (for cuda: nevt == ndim == gpublocks*gputhreads)
#endif /* clang-format on */

In both cases I would add after channelId (and before nevt when present) the extra arguments you suggest, random_for_helicity [in],selected_hel[out], random_for_color [in], selected_color [out]

sigmakin(momenta, coupling, answer, channelid,random_for_helicity [in],selected_hel[out], random_for_color [in], selected_color [out], IEVT) calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, jamp2, iEVT)

random_for_helicity and random_for_color would be a (list) of random number to do the selection

selected_hel/selected_color would be the one selected (selection done within sigmaKin

jamp2 would be a running sum over the helicity which correspond to the probability of a given color. (so init to zero in sigmakin)

For calculate_wavefunctions, I am rather confused about the algorithm here (and the physics). I need to think about it more. For the moment, in practice I would say I just have this vague thought:

you propose to extract from calculate_wavefunction the jamp2 for each helicity, then it is sigmakin that computes a random color
I was kind of wondering whether instead one can first toss a random helicity, and then inside calculate_wavefunction for that one give helicity toss a random color (so there would be no need to extract the jamp2 for each helicity, instead you would compute a random color for each helicity and export that, and then depending on the helicity you chose you would also hav ethe choice of color)...

So, summary is, I think you have a much better idea what to do with calculate_wavefunction! Just go ahead, write the algorithm, thehn I will vectorize it when porting it to cudacpp...

[ As mentioned, I realise I need to study a bit more the color algebra. A couple of references fr myself

most detailed and precise is section 2.3 of https://arxiv.org/abs/1106.0522 (madgraph5 paper)
this is actually referenced from the later https://arxiv.org/abs/1405.0301v2 (madgraph5_amcnlo paper) which specifically says that MG5aMC uses the color algebra of Madgraph5, see section 2.3 of ref 38 (the one above)
then one more specific but more complex about the physics is (one of three choices?) https://arxiv.org/abs/hep-ph/0209271 ]

ok thanks i will move in that direction then.

Concerning the color, the madevent algorithms pick the color and helicity independently. (i.e. we do miss the correlation between the color and the helicity picked if any)

so therefore the helicity "h" is picked with probability

|M_h|^2/(\sum_i |M_i|^2)

and the color "c" is picked according to

Jamp2(c)/(\sum Jamp2)

where

Jamp2(c) = \sum_h Jamp2_h(c) # here sum over helicity

So Jamp2 is at the exact same level as the multichannel numerator/denominator since this is a running sum over helicity. (but Jamp2 is a list per event and not a number per event)

Now another method (also supported in madevent with nhel=1 in the run_card but not compatible with helicity recycling) is to evaluate only ONE helicity per event in that case, the helicity picked for each event is determined a priori via a vegas (auto-optimized) grid. The bias in under-picking/over-picking will be automatically adjusted by the weight of the events.

Interestingly:

This change the definition of the multi-channel weight (since the weight is helicity dependent now) and therefore the efficiency of the phase-space integrator (no clue in which direction).
helicity filtering (with threshold) is not required here (they will be toss-away anyway)
This reduces the complexity of the "kernel" but should increase the number of kernel call.
We have never really study the efficiency of one method compare to the other within the code. (This mode was implemented for loop-induced where the sum over helicity is very expansive to do)

ok thanks i will move in that direction then.

Concerning the color, the madevent algorithms pick the color and helicity independently. (i.e. we do miss the correlation between the color and the helicity picked if any)

so therefore the helicity "h" is picked with probability
|M_h|^2/(\sum_i |M_i|^2)
and the color "c" is picked according to
Jamp2(c)/(\sum Jamp2)
where
Jamp2(c) = \sum_h Jamp2_h(c) # here sum over helicity
So Jamp2 is at the exact same level as the multichannel numerator/denominator since this is a running sum over helicity. (but Jamp2 is a list per event and not a number per event)

Hi Olivier, thanks.

What confuses me here is the fact that jamp2(c) is the sum over helicity of jamp2(h,c).

I assume here by the way that jamp2 means this is the square of jamp - or rather, the real square of the norm of the complex number jamp, ie jamp2(h,c)=conj(jamp(h,c))*jamp(h,c) for a given helicity h and color c. Maybe I am wrong here?

Let me try to vaguely clarify. What we do is

for each helicity h and color c we calculate jamp(h,c) which is a complex number
for each helicity, inc calculate_wavefunction, we calculate ME(h) as sum(c1,c2) conj(jamp(h,c1))colormatrix(c1,c2)jamp(h,c2)
then the total matrix element is ME = sum(h) ME(h)
in other words, ME = sum(c1,c2) colormatrix(c1,c2) [sum(h)conj(jamp(h,c1)*jamp(h,c2)]

What confuses me is that

if the colormatrix were the identity delta(c1,c2), then ME would be sum(c) sum(h) conj(jamp(h,c))*jamp(h,c) = sum(h) sum(c) jamp2(h,c) = sum(c) [sum(h) jamp2{h,c)]: however, the color matrix is not delta, so this is not true!
the algorithm you described above however does seem to be based on this formula: normalization is based on total probability sum(c) [ sum(h) jamp2(h,c) ], and the probability of one color c is taken to be [ sum(h) jamp2(h,c) ] divided by the previous sum
so in a way I have the impression that the color matrix is completely ignored in calculating the probabilities to pick a color?

It is not just "missing the correlation between the color and the helicity picked", it is really that I have the impression that the color picking by itself is based on some approximation? But again, probably I misunderstand...

There is the following discussion in one paper which is maybe related, but probably I do not fully understand that either... https://arxiv.org/abs/hep-ph/0209271 "The color-ﬂow decomposition nicely lends itself to merging with a shower Monte Carlo, such as HERWIG [18, 19] or Pythia [20], which is based on the color ﬂow of a given hard- scattering subprocess. A given color assignment typically has several color ﬂows that con- tribute. One of these color ﬂows is randomly chosen to be associated with the event, weighted by the square of the partial amplitude corresponding to that color ﬂow. The weight does not include a color coeﬃcient (since it is unity), unlike the fundamental-representation de- composition [10]. The event is then evolved with a shower Monte Carlo. This neglects the interference between diﬀerent color ﬂows, but this interference is suppressed by a power of 1/N2. This is not a deﬁciency of the color-ﬂow decomposition, but rather is an inherent feature of the shower Monte-Carlo approximation for soft radiation [21]."

Or maybe actually that paragraph actually is exactly what I am saying above? I mean, the color matrix(c1,c2) is NOT delta(c1,c2), but the color picking algorithm treats it as if it was delta(c1,c2) and neglects the off-diagonal terms, which are suppressed by a power of 1/9 (I guess N=3 here)?

For instance https://github.com/madgraph5/madgraph4gpu/blob/c2c57cd456b3d1555f04ac7f77f78398f5a8180e/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/CPPProcess.cc#L2369

      static constexpr fptype cf[ncolor][ncolor] = {
        { 512, -64, -64, 8, 8, 80, -64, 8, 8, -1, -1, -10, 8, -1, 80, -10, 71, 62, -1, -10, -10, 62, 62, -28 },
        { -64, 512, 8, 80, -64, 8, 8, -64, -1, -10, 8, -1, -1, -10, -10, 62, 62, -28, 8, -1, 80, -10, 71, 62 },
        { -64, 8, 512, -64, 80, 8, 8, -1, 80, -10, 71, 62, -64, 8, 8, -1, -1, -10, -10, -1, 62, -28, -10, 62 },
        { 8, 80, -64, 512, 8, -64, -1, -10, -10, 62, 62, -28, 8, -64, -1, -10, 8, -1, -1, 8, 71, 62, 80, -10 },
        { 8, -64, 80, 8, 512, -64, -1, 8, 71, 62, 80, -10, -10, -1, 62, -28, -10, 62, -64, 8, 8, -1, -1, -10 },
        { 80, 8, 8, -64, -64, 512, -10, -1, 62, -28, -10, 62, -1, 8, 71, 62, 80, -10, 8, -64, -1, -10, 8, -1 },
        { -64, 8, 8, -1, -1, -10, 512, -64, -64, 8, 8, 80, 80, -10, 8, -1, 62, 71, -10, 62, -1, -10, -28, 62 },
        { 8, -64, -1, -10, 8, -1, -64, 512, 8, 80, -64, 8, -10, 62, -1, -10, -28, 62, 80, -10, 8, -1, 62, 71 },
        { 8, -1, 80, -10, 71, 62, -64, 8, 512, -64, 80, 8, 8, -1, -64, 8, -10, -1, 62, -28, -10, -1, 62, -10 },
        { -1, -10, -10, 62, 62, -28, 8, 80, -64, 512, 8, -64, -1, -10, 8, -64, -1, 8, 71, 62, -1, 8, -10, 80 },
        { -1, 8, 71, 62, 80, -10, 8, -64, 80, 8, 512, -64, 62, -28, -10, -1, 62, -10, 8, -1, -64, 8, -10, -1 },
        { -10, -1, 62, -28, -10, 62, 80, 8, 8, -64, -64, 512, 71, 62, -1, 8, -10, 80, -1, -10, 8, -64, -1, 8 },
        { 8, -1, -64, 8, -10, -1, 80, -10, 8, -1, 62, 71, 512, -64, -64, 8, 8, 80, 62, -10, -28, 62, -1, -10 },
        { -1, -10, 8, -64, -1, 8, -10, 62, -1, -10, -28, 62, -64, 512, 8, 80, -64, 8, -10, 80, 62, 71, 8, -1 },
        { 80, -10, 8, -1, 62, 71, 8, -1, -64, 8, -10, -1, -64, 8, 512, -64, 80, 8, -28, 62, 62, -10, -10, -1 },
        { -10, 62, -1, -10, -28, 62, -1, -10, 8, -64, -1, 8, 8, 80, -64, 512, 8, -64, 62, 71, -10, 80, -1, 8 },
        { 71, 62, -1, 8, -10, 80, 62, -28, -10, -1, 62, -10, 8, -64, 80, 8, 512, -64, -1, 8, -10, -1, -64, 8 },
        { 62, -28, -10, -1, 62, -10, 71, 62, -1, 8, -10, 80, 80, 8, 8, -64, -64, 512, -10, -1, -1, 8, 8, -64 },
        { -1, 8, -10, -1, -64, 8, -10, 80, 62, 71, 8, -1, 62, -10, -28, 62, -1, -10, 512, -64, -64, 8, 8, 80 },
        { -10, -1, -1, 8, 8, -64, 62, -10, -28, 62, -1, -10, -10, 80, 62, 71, 8, -1, -64, 512, 8, 80, -64, 8 },
        { -10, 80, 62, 71, 8, -1, -1, 8, -10, -1, -64, 8, -28, 62, 62, -10, -10, -1, -64, 8, 512, -64, 80, 8 },
        { 62, -10, -28, 62, -1, -10, -10, -1, -1, 8, 8, -64, 62, 71, -10, 80, -1, 8, 8, 80, -64, 512, 8, -64 },
        { 62, 71, -10, 80, -1, 8, -28, 62, 62, -10, -10, -1, -1, 8, -10, -1, -64, 8, 8, -64, 80, 8, 512, -64 },
        { -28, 62, 62, -10, -10, -1, 62, 71, -10, 80, -1, 8, -10, -1, -1, 8, 8, -64, 80, 8, 8, -64, -64, 512 } }; // 2-D array[24][24]

In a way, the algorithm proceeds as if the matrix above only had the 512's on the diagonal, and neglects all other terms (which indeed are one order of magnitude smaller)?

yes that's the point. The parton-shower wants/need the leading color information (which is the same that N is very large --even if it is only 3--) and therefore you only got a (subset) of the diagonal to consider.

Interference between color is actually something that can not be written on the lhef (due to format limitation)

ok very good if that's the algorithm then I agree also with your calculate_wavefunction API... essentially you need to add a jamp2 to the signature, which is a pointer to an array with ncolors real numbers per event, and calling it just adds the contribution for the given helicity.

Now https://github.com/madgraph5/madgraph4gpu/blob/c2c57cd456b3d1555f04ac7f77f78398f5a8180e/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/CPPProcess.cc#L91

 // Evaluate |M|^2 for each subprocess
  // NB: calculate_wavefunctions ADDS |M|^2 for a given ihel to the running sum of |M|^2 over helicities for the given event(s)
  // (similarly, it also ADDS the numerator and denominator for a given ihel to their running sums over helicities)
  __device__ INLINE void /* clang-format off */
  calculate_wavefunctions( int ihel,
                           const fptype* allmomenta,      // input: momenta[nevt*npar*4]
                           const fptype* allcouplings,    // input: couplings[nevt*ndcoup*2]
                           fptype* allMEs                 // output: allMEs[nevt], |M|^2 running_sum_over_helicities
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
                           , fptype* allNumerators        // output: multichannel numerators[nevt], running_sum_over_helicities
                           , fptype* allDenominators      // output: multichannel denominators[nevt], running_sum_over_helicities
                           , const unsigned int channelId // input: multichannel channel id (1 to #diagrams); 0 to disable channel enhancement
#endif
#ifndef __CUDACC__
                           , const int nevt               // input: #events (for cuda: nevt == ndim == gpublocks*gputhreads)
#endif
                           )

In practice you need to add one variable, something like

 // Evaluate |M|^2 for each subprocess
  // NB: calculate_wavefunctions ADDS |M|^2 for a given ihel to the running sum of |M|^2 over helicities for the given event(s)
  // (similarly, it also ADDS the numerator and denominator for a given ihel to their running sums over helicities)
  __device__ INLINE void /* clang-format off */
  calculate_wavefunctions( int ihel,
                           const fptype* allmomenta,      // input: momenta[nevt*npar*4]
                           const fptype* allcouplings,    // input: couplings[nevt*ndcoup*2]
                           fptype* allMEs,                // output: allMEs[nevt], |M|^2 running_sum_over_helicities
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
                           fptype* allNumerators,        // output: multichannel numerators[nevt], running_sum_over_helicities
                           fptype* allDenominators,      // output: multichannel denominators[nevt], running_sum_over_helicities
                           const unsigned int channelId, // input: multichannel channel id (1 to #diagrams); 0 to disable channel enhancement
#endif
                           fptype* allJamp2      // output: [nevt*ncolor], running_sum_over_helicities
#ifndef __CUDACC__
                           , const int nevt               // input: #events (for cuda: nevt == ndim == gpublocks*gputhreads)
#endif
                           )

Then internally I would yet another memory buffer and memory access functions for this new type of data structure...

Hi @valassi, I'm a bit confused about all the assignment in the cudacpp part. I have branch your plugin (new branch color) where I have started the work to add support for the color. This is certainly not fully working (and require the latest 3.1.1_lo_vectorization commit) but I guess that either you can start from there for the cpp part and/or that we need to discuss more. (I think that, it is a bit pointless to continue to support the "builtin" standalone_gpu now...)

I will now concentrate on the change on the madgraph side (including fortran) for the channelID) where I will likely be more usefull than in the cpp part.

Hi @oliviermattelaer that's fine for me! I think I understood what I need to do on the cudacpp part.

I am a bit behind with a few minor issues though (the move to github, and later the makefile cleanup - maybe I can postpone the latter, but I want to move to github and I need to test other bits of my scripts first).

Go ahead and I will try to catch up Thanks! Andrea

oliviermattelaer / mg5amc_test

change of signature for cudacpp #18