xsec from fortran and cpp differ in gg_uu tmad tests (bug in getGoodHel implementation)

valassi commented 1 year ago

This is a followup to Olivier's "split_nonidentical_grouping" patch that I included in #619, and to the upcoming #626 about the addition of gq_ttq.

After Olivier's patch, gq_ttq now builds successfully (with two separate P1 directories, each with one DSIG1). The check.exe and gcheck.exe also complete successfully. Previously the build was failing (there was a single P1 directory, with DSIG1 and DSIG2 in Fortran, but the cudacpp presently only handles a single process.

However, in the tests of cmadevent/gmadevent, the cross section from fortran and cpp differ in gqttq tmad tests.

I open this as an issue to fix later on, but I will go on with merging #626 as-is.

Details here https://github.com/madgraph5/madgraph4gpu/pull/626/commits/2a8b34e61f91c8ff8efadd43529337bdca1a6fba

cc: @oliviermattelaer @roiser @zeniheisser

I guess this might well be a blocker for SM pp to tt+jets? If anyone wants to help, that's welcome :-) I will focus on other things like the vecsize rewriting...

oliviermattelaer commented 1 year ago

So I'm checking the following code

#generate p p > t t~
define p = p / g
add process p g > t t~ j
output PROC_sm_3 --vector_size=XXXX
launch
set ickkw 1
set mc_grouped_subproc F

(so pure fortran code, with "old" grouping)

with

XXXX = 1 : 93.82 (no vectorization)
XXXX = 2 : 94.37
XXXX = 8 : 95.12
XXXX = 100: 95.39

Setting ICKKW=0 seems to fix the issue:

1: 70.12 +- 0.18
8: 70.03 +- 0.22

Set ICKKW=0 and scale to HT/2 confirms that

1: 64.14 +- 0.1954 pb
8: 64.26 +- 0.1913 pb

set ICKKW=1 but remove/bypassing RWGT factor... (i.e. not call at all)

1: 69.08 +- 0.1614 pb
8: 69 +- 0.2035 pb

Idea: move the place where rewgt is called.

oliviermattelaer commented 1 year ago

one issue found is ipsel but does not seems to solve the real issue.

oliviermattelaer commented 1 year ago

So more investigation but no real progress here. One channel identify with the small change is G2 correct cross-section is 0.2730e+2 and vectorised code returns 0.2764 e+2

The difference is clearly in the rewgt function. I thought it was link to the q2bck but it does not seem to be the case.

Need to check more what's going on, so far it just seems has if the scale returned are not the same. (need to find a way to check that)

oliviermattelaer commented 1 year ago

Ok I found the issue (pff that was sooo long for me to identify it) The issue is that the renormalization scale used for the black box was not set per event but by batch

 for ivec=           1 asref=  0.11453761077792829
 for ivec=           2 asref=  0.11453761077792829
 for ivec=           3 asref=  0.11453761077792829
 for ivec=           4 asref=  0.11453761077792829
 for ivec=           5 asref=  0.11453761077792829
 for ivec=           6 asref=  0.11453761077792829
 for ivec=           7 asref=  0.11453761077792829
 for ivec=           8 asref=  0.11453761077792829
 for ivec=           1 asref=  0.10896760929172325
 for ivec=           2 asref=  0.10896760929172325
 for ivec=           3 asref=  0.10896760929172325
 for ivec=           4 asref=  0.10896760929172325
 for ivec=           5 asref=  0.10896760929172325
 for ivec=           6 asref=  0.10896760929172325
 for ivec=           7 asref=  0.10896760929172325
 for ivec=           8 asref=  0.10896760929172325

Still have to understand how to change running of as to fix the issue but at least it is clear now. (and found and fix another small issue by digging into that mess)

oliviermattelaer commented 1 year ago

That particular error is now fixed in https://github.com/mg5amcnlo/mg5amcnlo/commit/97e148886ba2360247b263e70ffd6e4716b1eb50

However, I still have some miss match (even smaller and less relevant) for the black box for the following script:

define p = p / g
add process u g > t t~ j
output PROC_TEST --vector_size=8
launch
set nevents 500k
set ickkw 1
set mc_grouped_subproc F
set pdfwgt T
set chcluster T # Note this is the problematic parameter that impacts the black box.

This is the result that I expect:

46.27 +- 0.02233 pb (correct value from 3.5.0) This is the result that I got:
46.13 +- 0.02268 pb(vecsize=64) [4 sigma away]
46.15 +- 0.02265 pb (vecsize=8) [3 sigma away]
46.22 +- 0.02365 p (vecsize=2) [1 sigma away]
46.26 +- 0.02338 pb (vecsize=1) [perfect]

The "problematic" lines are l466 SubProcesses/cluster.f

           if(chcluster)then
              if(id_cl(iproc,idij,i).ne.iconfig) cycle
           endif

And the problematic channel is here G4 (all other channel are just perfect spot on). Since the above line are part of the findmt function, I have check that the spectrum of output given by that function are the same in 3.x and this branch. However, I have noted that this function is call less often in the new branch. I do not worry about that (at least for the moment).

valassi commented 1 year ago

Hi Olivier, thanks.

Unfortunately however ths change does NOT fix the issue #630 here, namely that cpp and fortran cross sections are different for gqttq. I tested this in WIP MR #703.

Should I nevertheless try to use this for all other developments?

Note: maybe what you fixed is really about issue #678? That one is about "check that we get the same fortran results with and without vector interfaces".

oliviermattelaer commented 1 year ago

tbh, I'm completely lost in the number of issue that you are creating. I can not track 163 open issue... So yes, what I'm tracking here is indeed the fact that the cross-section is not the same with and without vectorization. So a pure fortran issue and inded the same as #678.

So sorry, I will move my note on this issue within #678, assign that one to me and assign this one to you then.

For your question

Should I nevertheless try to use this for all other developments?

Yes, all the change related to the black box not correctly handling vectorization are important to include of course since they are the example that CMS is using and therefore critical. Additionally, the difference between fortran/cpp is kind of irrelevant if we do not have first an agreement with pure fortran first.

valassi commented 1 year ago

tbh, I'm completely lost in the number of issue that you are creating. I can not track 163 open issue...

sorry, not my fault there are many different issues open ... ;-)

Yes, all the change related to the black box not correctly handling vectorization are important to include

thanks, I will go on with more tests on MR #703 and merge that...

valassi commented 1 year ago

Hi @oliviermattelaer thanks for reassigning.

I did a few tests in WIP MR #705. The problem that the cross-section is different in fortran and c++ is not only in gq_ttq where q is any of 8 quarks or antiquarks. It is also there just for gu_ttu.

So this seems to be a physics issue, not a technicality (namely: it is NOT because we have four subprocesses for u d s c).

Some diagrams are missing in the cpp version?? Unlikely also because cpp gives a higher cross section

ERROR! xsec from fortran (0.16092748331717582) and cpp (0.78431337733339346) differ by more than 2E-14 (3.873706847124251)

I will try to use the ME comparison tool...

valassi commented 1 year ago

This is really bizarre. Why do all other processes we tested look ok, while this looks totally wrong?!

CUDACPP_RUNTIME_FBRIDGEMODE=-1 CUDACPP_RUNTIME_VECSIZEUSED=4 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_guttu_x1_cudacpp
...
 NGOODHEL =          16
 NCOMB =          32
WARNING! (   1/20) Deviation more than 5E-5   1  0.18424930E-06  0.19961093E-06          1.08337410505
WARNING! (   2/20) Deviation more than 5E-5   2  0.12298190E-05  0.17566777E-04         14.28403426081
WARNING! (   3/20) Deviation more than 5E-5   3  0.47867089E-05  0.13080906E-04          2.73275576887
WARNING! (   4/20) Deviation more than 5E-5   4  0.67126267E-07  0.15328159E-06          2.28348154123
 MULTI_CHANNEL = TRUE
 CHANNEL_ID =           1
WARNING! (   5/20) Deviation more than 5E-5   1  0.87505696E-07  0.23515878E-05         26.87353956634
WARNING! (   6/20) Deviation more than 5E-5   2  0.88522932E-07  0.55065835E-05         62.20516418777
WARNING! (   7/20) Deviation more than 5E-5   3  0.61529585E-06  0.46617981E-05          7.57651467744
WARNING! (   8/20) Deviation more than 5E-5   4  0.50613865E-06  0.10890432E-04         21.51669710237
WARNING! (   9/20) Deviation more than 5E-5   1  0.23220462E-06  0.16397640E-05          7.06171998630
WARNING! (  10/20) Deviation more than 5E-5   2  0.53598828E-06  0.27529707E-05          5.13625164625
WARNING! (  11/20) Deviation more than 5E-5   3  0.70436299E-07  0.32287007E-05         45.83859074133
WARNING! (  12/20) Deviation more than 5E-5   4  0.17665132E-06  0.40475396E-05         22.91259144688
WARNING! (  13/20) Deviation more than 5E-5   1  0.34390469E-06  0.40648397E-06          1.18196690722
WARNING! (  14/20) Deviation more than 5E-5   2  0.12241045E-07  0.23368242E-06         19.09007152096
WARNING! (  15/20) Deviation more than 5E-5   3  0.47051462E-05  0.17253164E-04          3.66687089246
WARNING! (  16/20) Deviation more than 5E-5   4  0.52423733E-06  0.15269022E-05          2.91261623892
WARNING! (  17/20) Deviation more than 5E-5   1  0.10784061E-06  0.65569963E-05         60.80266230167
WARNING! (  18/20) Deviation more than 5E-5   2  0.32402786E-06  0.44110455E-06          1.36131671907
WARNING! (  19/20) Deviation more than 5E-5   3  0.10754405E-05  0.25988847E-05          2.41657706384
WARNING! (  20/20) Deviation more than 5E-5   4  0.46937402E-05  0.11382052E-04          2.42494286145
 NGOODHEL =          16
 NCOMB =          32

Is it again a problem of helicities? (I thought I had a printout of fortran and cpp helicities)

valassi commented 1 year ago

Hi @oliviermattelaer (cc @roiser) I pinned this issue because I find too huge.

In practice, I am checking cross sections in fortran and cpp and comoaring them

ok for gg to tt, gg to ttg, gg to ttgg, gg to ttggg (we knew that)
also ok for uu to tt
NOT OK for gg to uu, gg to gg, gu to ttu, uu to ttg (an dthe original gq to ttq which triggered this issue)

In a way, as soon as we have something not as massive as top, the results look weird

Can it be a problem of parameters? (fortran and cpp using different parameters)?

Can it be a problem of helicities?

Any other suggestion?

zeniheisser commented 1 year ago

Just to make sure, @valassi does gg to uu work? That one should use the exact same code as gg to tt but with different masses, and should also have the same cross section as gg to uu, so if one works but not the other that would suggest the different is in initial state quarks specifically.

valassi commented 1 year ago

Hi @zeniheisser thanks!

No, gg to uu also has an issue https://github.com/valassi/madgraph4gpu/commit/8002d6f3403b9c8706833b0eb5225e0639a203a7

ERROR! xsec from fortran (2755752.8574748533) and cpp (1787504.4305257690) differ by more than 2E-14 (0.3513553199528596)

valassi commented 1 year ago

You have an interesting suggestion indeed. The code looks very similar, but not exactly the same. Trying to get the most relevant bits

diff -r gg_uu.mad/SubProcesses/P1_gg_uux/ gg_tt.mad/SubProcesses/P1_gg_ttx/
...
diff -r gg_uu.mad/SubProcesses/P1_gg_uux/configs.inc gg_tt.mad/SubProcesses/P1_gg_ttx/configs.inc
11c11
<       DATA TPRID(-1,2)/2/
---
>       DATA TPRID(-1,2)/6/
20c20
<       DATA TPRID(-1,3)/2/
---
>       DATA TPRID(-1,3)/6/
...
diff -r gg_uu.mad/SubProcesses/P1_gg_uux/CPPProcess.cc gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.cc
47c47
< // Process: g g > u u~ WEIGHTED<=2 @1
---
> // Process: g g > t t~ WEIGHTED<=2 @1
80c80
<   //__device__ const fptype* cIPD = nullptr; // unused as nparam=0
---
>   __device__ const fptype cIPD[2] = { (fptype)Parameters_sm::mdl_MT, (fptype)Parameters_sm::mdl_WT };
84c84
<   //__device__ __constant__ fptype* cIPD = nullptr; // unused as nparam=0
---
>   __device__ __constant__ fptype cIPD[2];
87c87
<   //static fptype* cIPD = nullptr; // unused as nparam=0
---
>   static fptype cIPD[2];
245,252c245
< #if not( defined __CUDACC__ and defined MGONGPU_TEST_DIVERGENCE )
<       oxzxxx<M_ACCESS, W_ACCESS>( momenta, cHel[ihel][2], +1, w_fp[2], 2 );
< #else
<       if( ( blockDim.x * blockIdx.x + threadIdx.x ) % 2 == 0 )
<         oxzxxx<M_ACCESS, W_ACCESS>( momenta, cHel[ihel][2], +1, w_fp[2], 2 );
<       else
<         oxxxxx<M_ACCESS, W_ACCESS>( momenta, 0, cHel[ihel][2], +1, w_fp[2], 2 )
< #endif
---
>       oxxxxx<M_ACCESS, W_ACCESS>( momenta, cIPD[0], cHel[ihel][2], +1, w_fp[2], 2 );
254c247
<       ixzxxx<M_ACCESS, W_ACCESS>( momenta, cHel[ihel][3], -1, w_fp[3], 3 );
---
>       ixxxxx<M_ACCESS, W_ACCESS>( momenta, cIPD[0], cHel[ihel][3], -1, w_fp[3], 3 );
270c263
<       FFV1_1<W_ACCESS, CD_ACCESS>( w_fp[2], w_fp[0], COUPs[1], 0., 0., w_fp[4] );
---
>       FFV1_1<W_ACCESS, CD_ACCESS>( w_fp[2], w_fp[0], COUPs[1], cIPD[0], cIPD[1], w_fp[4] );
283c276
<       FFV1_2<W_ACCESS, CD_ACCESS>( w_fp[3], w_fp[0], COUPs[1], 0., 0., w_fp[4] );
---
>       FFV1_2<W_ACCESS, CD_ACCESS>( w_fp[3], w_fp[0], COUPs[1], cIPD[0], cIPD[1], w_fp[4] );
300c293
<       // (This method used to be called CPPProcess::matrix_1_gg_uux()?)
---
>       // (This method used to be called CPPProcess::matrix_1_gg_ttx()?)
510,511c503,504
<     m_masses.push_back( m_pars->ZERO );
<     m_masses.push_back( m_pars->ZERO );
---
>     m_masses.push_back( m_pars->mdl_MT );
>     m_masses.push_back( m_pars->mdl_MT );
514c507
<     //const fptype tIPD[0] = { ... }; // nparam=0
---
>     const fptype tIPD[2] = { (fptype)m_pars->mdl_MT, (fptype)m_pars->mdl_WT };
517c510
<     //checkCuda( cudaMemcpyToSymbol( cIPD, tIPD, 0 * sizeof( fptype ) ) ); // nparam=0
---
>     checkCuda( cudaMemcpyToSymbol( cIPD, tIPD, 2 * sizeof( fptype ) ) );
520c513
<     //memcpy( cIPD, tIPD, 0 * sizeof( fptype ) ); // nparam=0
---
>     memcpy( cIPD, tIPD, 2 * sizeof( fptype ) );
522a516
>     //for ( i=0; i<2; i++ ) std::cout << std::setprecision(17) << "tIPD[i] = " << tIPD[i] << std::endl;
540,541c534,535
<     m_masses.push_back( Parameters_sm::ZERO );
<     m_masses.push_back( Parameters_sm::ZERO );
---
>     m_masses.push_back( Parameters_sm::mdl_MT );
>     m_masses.push_back( Parameters_sm::mdl_MT );
diff -r gg_uu.mad/SubProcesses/P1_gg_uux/CPPProcess.h gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.h
15,16c15,16
< #ifndef MG5_Sigma_sm_gg_uux_H
< #define MG5_Sigma_sm_gg_uux_H 1
---
> #ifndef MG5_Sigma_sm_gg_ttx_H
> #define MG5_Sigma_sm_gg_ttx_H 1
36c36
<   // Process: g g > u u~ WEIGHTED<=2 @1
---
>   // Process: g g > t t~ WEIGHTED<=2 @1
188c188
< #endif // MG5_Sigma_sm_gg_uux_H
---
> #endif // MG5_Sigma_sm_gg_ttx_H
...
diff -r gg_uu.mad/SubProcesses/P1_gg_uux/leshouche.inc gg_tt.mad/SubProcesses/P1_gg_ttx/leshouche.inc
1c1
<       DATA (IDUP(I,1,1),I=1,4)/21,21,2,-2/
---
>       DATA (IDUP(I,1,1),I=1,4)/21,21,6,-6/
...
diff -r gg_uu.mad/SubProcesses/P1_gg_uux/matrix1.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f
15c15
< C     Process: g g > u u~ WEIGHTED<=2 @1
---
> C     Process: g g > t t~ WEIGHTED<=2 @1
311c311
< C     Process: g g > u u~ WEIGHTED<=2 @1
---
> C     Process: g g > t t~ WEIGHTED<=2 @1
354a355
>       DOUBLE PRECISION FK_MDL_WT
355a357
>       SAVE FK_MDL_WT
401a404,405
>         IF(MDL_WT.NE.0D0) FK_MDL_WT = SIGN(MAX(ABS(MDL_WT), ABS(MDL_MT
>      $   *SMALL_WIDTH_TREATMENT)), MDL_WT)
411,412c415,416
<       CALL OXXXXX(P(0,3),ZERO,NHEL(3),+1*IC(3),W(1,3))
<       CALL IXXXXX(P(0,4),ZERO,NHEL(4),-1*IC(4),W(1,4))
---
>       CALL OXXXXX(P(0,3),MDL_MT,NHEL(3),+1*IC(3),W(1,3))
>       CALL IXXXXX(P(0,4),MDL_MT,NHEL(4),-1*IC(4),W(1,4))
416c420
<       CALL FFV1_1(W(1,3),W(1,1),GC_11(IVEC),ZERO, FK_ZERO,W(1,5))
---
>       CALL FFV1_1(W(1,3),W(1,1),GC_11(IVEC),MDL_MT, FK_MDL_WT,W(1,5))
419c423
<       CALL FFV1_2(W(1,4),W(1,1),GC_11(IVEC),ZERO, FK_ZERO,W(1,5))
---
>       CALL FFV1_2(W(1,4),W(1,1),GC_11(IVEC),MDL_MT, FK_MDL_WT,W(1,5))
...
diff -r gg_uu.mad/SubProcesses/P1_gg_uux/pmass.inc gg_tt.mad/SubProcesses/P1_gg_ttx/pmass.inc
3,4c3,4
<       PMASS(3)=ZERO
<       PMASS(4)=ZERO
---
>       PMASS(3)=ABS(MDL_MT)
>       PMASS(4)=ABS(MDL_MT)
diff -r gg_uu.mad/SubProcesses/P1_gg_uux/processes.dat gg_tt.mad/SubProcesses/P1_gg_ttx/processes.dat
1c1
< 1       g g > u u~
---
> 1       g g > t t~
diff -r gg_uu.mad/SubProcesses/P1_gg_uux/props.inc gg_tt.mad/SubProcesses/P1_gg_ttx/props.inc
4,5c4,5
<       PRMASS(-1,2)  = ZERO
<       PRWIDTH(-1,2) = ZERO
---
>       PRMASS(-1,2)  = ABS(MDL_MT)
>       PRWIDTH(-1,2) = ABS(MDL_WT)
7,8c7,8
<       PRMASS(-1,3)  = ZERO
<       PRWIDTH(-1,3) = ZERO
---
>       PRMASS(-1,3)  = ABS(MDL_MT)
>       PRWIDTH(-1,3) = ABS(MDL_WT)

valassi commented 1 year ago

In principle I had checked quite extensively that oxxxxx gives the same results as the oxzxxx... then of course there can be issues

Also the funny thing is that it works when some parameters are not 0, and does not when they are 0. So one cannot say "well with 0 some stuff does not count, then when it is not 0 you get an issue because the value may be wrong"

oliviermattelaer commented 1 year ago

Why do you have the oxz here? The z is only valid for initial state… and i and o are not valid for gluon….

Is that the issue?

Sent from Outlook for iOShttps://aka.ms/o0ukef

From: Andrea Valassi @.> Sent: Wednesday, June 14, 2023 6:34:17 PM To: madgraph5/madgraph4gpu @.> Cc: Olivier Mattelaer @.>; Mention @.> Subject: Re: [madgraph5/madgraph4gpu] xsec from fortran and cpp differ in gu_ttu tmad tests (Issue #630)

In principle I had checked quite extensively that oxxxxx gives the same results as the oxzxxx... then of course there can be issues

Also the funny thing is that it works when some parameters are not 0, and does not when they are 0. So one cannot say "well with 0 some stuff does not count, then when it is not 0 you get an issue because the value may be wrong"

— Reply to this email directly, view it on GitHubhttps://github.com/madgraph5/madgraph4gpu/issues/630#issuecomment-1591616683, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH6535SGKRBVX5CO4SKXEDTXLHRYTANCNFSM6AAAAAAWT3UW4Q. You are receiving this because you were mentioned.Message ID: @.***>

valassi commented 1 year ago

I just tried using the generic ixxx and oxxx instead in gguu, the xsecs change a very tiny bit in c++, but they are still wildly different from c++ https://github.com/valassi/madgraph4gpu/commit/a9b14212d07143f723f8c1b21faa2912355adafd

valassi commented 1 year ago

Why do you have the oxz here? The z is only valid for initial state… and i and o are not valid for gluon….

No I think using the oxz is correct anyway: it refers to the quark - when massive like top I use the generic oxxx, when massless like u I use the simpler version (but again, I checked and the results are the same - I had tested these massively in the past)

valassi commented 1 year ago

I am really wondering if it is some parameter (even alphas!) that is different in the fortran and c++ parameter set, I will check

zeniheisser commented 1 year ago

Maybe check event-by-event amplitudes and see where the difference starts? Ie if it's already at external particles or if it comes at propagator or event amplitude evaluation?

valassi commented 1 year ago

Maybe check event-by-event amplitudes and see where the difference starts? Ie if it's already at external particles or if it comes at propagator or event amplitude evaluation?

Hi @zeniheisser thanks, no I have not tried that. Would you be available to help debugging? That would be great :-)

Thanks Andrea

valassi commented 1 year ago

I am really wondering if it is some parameter (even alphas!) that is different in the fortran and c++ parameter set, I will check

No the parameters look correct (I think), alphas is 0.118 in both cases

valassi commented 1 year ago

Maybe check event-by-event amplitudes and see where the difference starts? Ie if it's already at external particles or if it comes at propagator or event amplitude evaluation?

Hi @zeniheisser thanks, no I have not tried that. Would you be available to help debugging? That would be great :-)

Thanks Andrea

In the meantime I probably found it.

As I was suspecting, it is probably the helicities...

I have made a test of ggtt where I artificially put the top mass to 0 (and the width to 0). In this case the cross sections mismatch, but it is also interesting that the number of helicities goes down from 16 to 8 (in fortran)

Then I tried with a very low mass, but non zero, In this case

the fortran and cpp cross sections agree!
the fortran cross section is essentially the same as that with 0 mass (which makes sense)
the cpp cross section is much larger
TBC the number of helicities has increase ind both cases

valassi commented 1 year ago

Actually, I did not notice, it WAS written there, the number of helicities!

for gguu, fortran gives 8 (eight) helicities
for gguu, cpp gives 6 (six) helicities

See

cat tmad/logs_gguu_mad/log_gguu_mad_d_inl0_hrd0.txt | egrep -i '(ngoodhel|execute)'
*** (1) EXECUTE MADEVENT_FORTRAN (create results.dat) ***
 [NGOODHEL] ngoodhel/ncomb = 8/16
*** (1) EXECUTE MADEVENT_FORTRAN x1 (create events.lhe) ***
 [NGOODHEL] ngoodhel/ncomb = 8/16
*** (1) EXECUTE MADEVENT_FORTRAN x10 (create events.lhe) ***
 [NGOODHEL] ngoodhel/ncomb = 8/16
*** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) ***
 [NGOODHEL] ngoodhel/ncomb = 6/16

So cpp is giving a lower cross section because it is missing two helicities (why?!)

I will also create an issue to add to the tmad test the check that the numnber of helicities is the same

zeniheisser commented 1 year ago

I could take a look at it tomorrow if you haven't figured it out, but it very clearly seems to be the helicity filtering in this case. Maybe the comparison is too large, or too few phase space points are considered? May also be that there is another issue with the xz wavefunctions that only appears for specific helicities?

valassi commented 1 year ago

I could take a look at it tomorrow if you haven't figured it out, but it very clearly seems to be the helicity filtering in this case. Maybe the comparison is too large, or too few phase space points are considered? May also be that there is another issue with the xz wavefunctions that only appears for specific helicities?

Thanks @zeniheisser ! Yes it is unlikely that I will work on it today or tomorrow - if you can have a look that would be useful.

The issue with helicity was already there in the past, eg #419.

Note, eventually we may want/prefer to implement #564? That is to say, allow passing the set of helicities between fortran and cudacpp.

Anyway, for now I think it would be enough to understand why the helicity selection is different in fortran/cudacpp, and develop a quick hack/workaround to make sure they give the same results.

Andrea

zeniheisser commented 1 year ago

Alright, I'm taking a look at the standalone output right now, and I've started with comparing u u~ > 2g and 2g > u u~ with some prints for the goodHel functions. First, with initial state quarks,

[zwetters@itscrd-a100 P1_Sigma_sm_uux_gg]$ ./gcheck.exe 1 32 1
sigmaKin_setGoodHel ihel=0 true
sigmaKin_setGoodHel ihel=1 true
sigmaKin_setGoodHel ihel=2 true
sigmaKin_setGoodHel ihel=3 true
sigmaKin_setGoodHel ihel=4 false
sigmaKin_setGoodHel ihel=5 false
sigmaKin_setGoodHel ihel=6 false
sigmaKin_setGoodHel ihel=7 false
sigmaKin_setGoodHel ihel=8 false
sigmaKin_setGoodHel ihel=9 false
sigmaKin_setGoodHel ihel=10 false
sigmaKin_setGoodHel ihel=11 false
sigmaKin_setGoodHel ihel=12 true
sigmaKin_setGoodHel ihel=13 true
sigmaKin_setGoodHel ihel=14 true
sigmaKin_setGoodHel ihel=15 true
nGoodHel=8

with helicities

453     // Helicities for the process [NB do keep 'static' for this constexpr array, see issue #283]
 454     // *** NB There is no automatic check yet that these are in the same order as Fortran! #569 ***
 455     static constexpr short tHel[ncomb][npar] = {
 456       { 1, -1, -1, -1 },
 457       { 1, -1, -1, 1 },
 458       { 1, -1, 1, -1 },
 459       { 1, -1, 1, 1 },
 460       { 1, 1, -1, -1 },
 461       { 1, 1, -1, 1 },
 462       { 1, 1, 1, -1 },
 463       { 1, 1, 1, 1 },
 464       { -1, -1, -1, -1 },
 465       { -1, -1, -1, 1 },
 466       { -1, -1, 1, -1 },
 467       { -1, -1, 1, 1 },
 468       { -1, 1, -1, -1 },
 469       { -1, 1, -1, 1 },
 470       { -1, 1, 1, -1 },
 471       { -1, 1, 1, 1 } };

which is the correct result (massless fermions only couple to other massless fermions with opposite spin). For final state quarks, though, we get

[zwetters@itscrd-a100 P1_Sigma_sm_gg_uux]$ ./gcheck.exe 1 32 1
sigmaKin_setGoodHel ihel=0 true
sigmaKin_setGoodHel ihel=1 false
sigmaKin_setGoodHel ihel=2 false
sigmaKin_setGoodHel ihel=3 true
sigmaKin_setGoodHel ihel=4 true
sigmaKin_setGoodHel ihel=5 false
sigmaKin_setGoodHel ihel=6 false
sigmaKin_setGoodHel ihel=7 true
sigmaKin_setGoodHel ihel=8 true
sigmaKin_setGoodHel ihel=9 false
sigmaKin_setGoodHel ihel=10 false
sigmaKin_setGoodHel ihel=11 true
sigmaKin_setGoodHel ihel=12 false
sigmaKin_setGoodHel ihel=13 false
sigmaKin_setGoodHel ihel=14 false
sigmaKin_setGoodHel ihel=15 false
nGoodHel=6

with helicities

453     // Helicities for the process [NB do keep 'static' for this constexpr array, see issue #283]
 454     // *** NB There is no automatic check yet that these are in the same order as Fortran! #569 ***
 455     static constexpr short tHel[ncomb][npar] = {
 456       { -1, -1, -1, 1 },
 457       { -1, -1, -1, -1 },
 458       { -1, -1, 1, 1 },
 459       { -1, -1, 1, -1 },
 460       { -1, 1, -1, 1 },
 461       { -1, 1, -1, -1 },
 462       { -1, 1, 1, 1 },
 463       { -1, 1, 1, -1 },
 464       { 1, -1, -1, 1 },
 465       { 1, -1, -1, -1 },
 466       { 1, -1, 1, 1 },
 467       { 1, -1, 1, -1 },
 468       { 1, 1, -1, 1 },
 469       { 1, 1, -1, -1 },
 470       { 1, 1, 1, 1 },
 471       { 1, 1, 1, -1 } };

where the two valid configurations where both gluons have positive helicity (12 and 15) for some reason do not contribute.

zeniheisser commented 1 year ago

I haven't been able to determine exactly where the issue is, but I can say the following:

For u u~ > 2g, all 8 valid helicity configurations are found (I haven't tested the xsecs, but you said that this one is still wrong, Andrea?)
For 2g > u u~, only 6 of the valid configs are found -- the two that are missing are the ones where both incoming gluons have positive helicity. Notably, the configs where both incoming gluons have negative helicity do not disappear.
If we check leptons, we actually get the opposite: for 2a > e+ e-, all six valid helicity configs are found (two of them vanish, but should only result in numerical zeros which our current comparison is expected to miss). For e+ e- > 2a, however, the configuration where the photons have positive helicities is determined to be bad, while the conjugated process where the photons have negative helicities does not disappear.

If it is true that u u~ > 2g gives the wrong cross section despite determining those extra configs to be non-zero, it may be worth noting that the only real difference between this and the crossed process is that 2g > u u~ uses only FFV_0 amplitude evaluations, whereas one of the three amplitudes for u u~ > 2g uses a VVV_0 subroutine. Maybe there is an issue with FFV_0 for m=0? Also, of course, note the fact that all "missing" helicities seem to be for positive spin vector bosons, so it is also worth looking there.

valassi commented 1 year ago

Hi @zeniheisser thanks!

For u u~ > 2g, all 8 valid helicity configurations are found (I haven't tested the xsecs, but you said that this one is still wrong, Andrea?)

No I think I had not tested this one. So I suggest you ignore it for now.

For 2g > u u~, only 6 of the valid configs are found -- the two that are missing are the ones where both incoming gluons have positive helicity. Notably, the configs where both incoming gluons have negative helicity do not disappear.

This instead is the real issue! Did you manage to see why two are missing? I mean, is the computed ME equal to 0 for all events in the basket used for testing helicities?

zeniheisser commented 1 year ago

No I think I had not tested this one. So I suggest you ignore it for now.

Above you said you tested it in valassi@8002d6f,

ERROR! xsec from fortran (2755752.8574748533) and cpp (1787504.4305257690) differ by more than 2E-14 (0.3513553199528596)
so I think that's actually right. Since this and 2g > u u~ are essentially identical save for the fact that for the first amplitude the VVV and the FFV vertices switch places, one being an amplitude and the other being a propagator.

This instead is the real issue! Did you manage to see why two are missing? I mean, is the computed ME equal to 0 for all events in the basket used for testing helicities? I didn't manage to get the amplitudes themselves printed -- printf didn't naively work with our cxtype, and it doesn't have an implemented comparison operator, so actually printing the amplitudes directly was a bit more difficult than I'd originally anticipated without being a master of those hehe

But if you want to try it out yourself, you can just add a printf call in the CPPProcesses.cc file, just after the calls to the _0 functions, which prints the current ihel value and the value of amp_sv[0]. Running that for the minimal amount of iterations, it should be clear whether the amplitudes are zero immediately on evaluation. If they are, you could then print the wavefunctions themselves to see which elements are zero. If they're non-zero (and don't sum to zero), the errors is definitely in the actual getGoodHelicities, while if they sum to zero there's most likely a sign error somewhere in vertex function or wavefunction.

valassi commented 1 year ago

Thanks to @zeniheisser this now seems to be better understood.

The problem is most likely that cudacpp uses fewer helicities than fortran, because

fortran (when LIMHEL=0) checks that one individual helicity contribution is greater than 0
cudacpp compares that the ME sum with n+1 helicities is greater than the ME sum with n helicities

These two algorithms are in any case different, because the cudacpp algorithm would suppress helicities whose contribution is E-16 suppressed with respect to the other ones

One possible solution that should be tried out (and which incidentally would allow using a non-zero LIMHEL also in cudacpp) is to have cudacpp calculate_wavefunction compute individual helicity contributions, rather than add those to the previous ME sum.

valassi commented 1 year ago

This has been closed by MR #705. (I have reopened it only to document it).

This issue #630 had initially been opened about the mismatch of the fortran and cpp cross sections in gq_ttq. But then eventually it focused on gg_uu instead which had a similar issue. The status is the following in summary:

The mismatch in gg_uu, which is now considered the main point of this #630, has been fixed by MR #705, using an improved getGoodHel implementation, using previous debugging from @zeniheisser. See comments below.
The mismatch if cross sections in gq_ttq is NOT fixed by the new getGoodHel implementation. Since this #630 has come to focus on gg_uu instead, I have opened a brand new issue #748 about the mismatch in gq_ttq.

More specifically about gg_uu cross sections. Zenny had found that the number of good helicities in cudacpp was lower than that in fortran for gguu, because the |old getgoodhel implementation was comparing the ME sum for i+1 helicities to the ME sum for i helicities, rather than comparing directly the i+1-th ME contribution to 0. If the i+1 ME contribution is heavily suppressed with respect to the currrent running sum of MEs for i helicities (by 1E-16 in double FP precision, for instance), then the sums are the same. This has now been fixed in MR #705: I now compare directly the i+1-th contribution to 0. The cross sections from fortran and cudacpp for gguu are now the same... but those for gq_ttq are still different, hence I opened a new issue #748.

Even more in detail, there were two issues that I wanted to understand a bit better before closing this.

The first is quite silly, I thought that the gguu cross sections in fortran and cudacpp magically became the same (with the old getgoodhel) when decreasing the top mass to 0. This is wrong, I had not decreased the top mass to 0 in gguu. On the contrary, I had decreased the top mass to 0 in gg_tt, and indeed in that case I found that gg_tt behaved like gg_uu, i.e. the cross sections from fortran and cudacpp were different,
The second one is more complex: I could not understand how it is possible that an E-16 effect on the ME from one missing helicity could end up giving cross sections that are off by a factor two. I believe I have now understood that this comes from a combination of factors. First, the gguu cross section is essentially divergent, as we put no cuts in these tests: as a consequence, the cross section that is printed out depends on which events are used to compute the cross section. And indeed I have seen that the test giving a factor two discrepancy between fortran and cudacpp gguu cross sections were using different numbers of accepted and rejected events. A second factor that is more subtle is that the actual "MEs" that are given as output from the bridge are the single-diagram-enhanced MEs for a given channel (MadEvent sampling algorithm), i.e. specifically the ME multiplied by a numerator (amplitude squared) and divided by a denominator (sum of amplitudes squared). And here a third factor is that there are large cancellations when summing three diagrams for a given helicity (this is what we discussed about CADNA and about helicity vs chirality flow), but the numerators and denominators use individual amplitudes squared, so there are no such cancellations! This means that, even if the squared sum of amplitudes is essentially the same whether a helicity is included or not, the numerator/denominator factors are very different when a helicity is included or not. This in turns affects how events are selected and sampled, and because the cross section is divergent and depends wildly on which events are used, an E-16 effect ends up being transformed into a factor 2 or more effect...

I therefore believe the two points above are understood, an dthis issue #630 can be closed. Before doing that, however, I will still open a few more issues for further testst and add links to them here.

valassi commented 1 year ago

Unless I am forgetting something, the main points that still need to be followed up here are

high priority: gq_ttq cross sections still mismatch, #748
lower priority, implement LIMHEL>0 getgoodhel in cudacpp (and debug it in fortran), see the new comments on old issue #564
low priority, understand the effect of LIMHEL and of helicity selection on event selection and cross sections eg for gguu, #749

This issue #630 (bug in getgoodhel, resulting in gguu cross section mismatch) is now fixed and can be closed.

madgraph5 / madgraph4gpu

xsec from fortran and cpp differ in gg_uu tmad tests (bug in getGoodHel implementation) #630