madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 32 forks source link

xsec from fortran and cpp differ in gq_ttq tmad tests (wrong order of couplings) #748

Closed valassi closed 3 months ago

valassi commented 1 year ago

This is a follow-up to #630, which had originally been files about gq_ttq but eventually focused on gg_uu: xsec from fortran and cpp differ in gq_ttq tmad tests.

In #630, I had also found out that xsec from fortran and cpp also differed in gg_uu tmad tests, but with Zenny's help this was identified as a problem in getGoodHel and fixed. This however fixed gg_uu, but not gq_ttq, where xsecs are still different. For clarity, I move here gq_ttq, while I will soon close #630 about gg_uu.

roiser commented 1 year ago

I started looking into this. What I’m doing is to create via our new plugin interface the directories for fortran and c++ code with

generate p p > t t~ j
output madevent ppttj_for_sde1 --hel_recycling=False

and

generate p p > t t~ j 
output madevent ggttj_cpp_sde1 --hel_recycling=False --vector_size=32 --me_exporter=standalone_cudacpp

respectively. My first observation is that there are different numbers of P1 directories produced i.e.

ls ppttj_for_sde1/SubProcesses/P1_
P1_gg_ttxg/ P1_gq_ttxq/ P1_qq_ttxg/ 

and

ls ppttj_cpp_sde1/SubProcesses/P1_
P1_gg_ttxg/   P1_gu_ttxu/   P1_gux_ttxux/ P1_uux_ttxg/ 

Is this related to the sde_stragety variable in the run_card? Pls note that in both cases this is set to 1. I presume the fortran/P1_gq_ttxq corresponds to the cpp/P1_gu_ttxu + cpp/P1_gux_ttxux

Going a bit further I then produce the cross sections for fortran/P1_gq_ttxq and cpp/P1_gu_ttxu, cpp/P1_gux_ttxux where I end up with numbers e.g.

fortran/P1_gq_ttxq : Event xsec   0.52569008216473057

and

cpp/P1_gu_ttxu : Event xsec   0.21427934258181794 
cpp/P1_gux_ttxux : Event xsec   0.89817471781675562  

can I sum up the two cpp ones to the fortran one? They don’t … there is a factor 2 between them.

Before going further I wanted to see if I can reproduce the same numbers in fortran with the two P1 directories generated. Can this generation of the two P1 directories be forced?

valassi commented 1 year ago

Hi @roiser thanks for looking into this. (cc @oliviermattelaer @hageboeck @zeniheisser @Jooorgen for info on the tests)

Apologies, I should have posted the details here of where I see the errors. There is a script already-made where you get everything automatically. For instance with current upstream/master:

cd epochX/cudacpp
./tmad/teeMadX.sh -gqttq +10x -makeclean
...
*** (1) EXECUTE MADEVENT_FORTRAN x1 (create events.lhe) ***
--------------------
CUDACPP_RUNTIME_FBRIDGEMODE = (not set)
CUDACPP_RUNTIME_VECSIZEUSED = 8192
--------------------
8192 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
1 ! Channel number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)
--------------------
Executing ' ./madevent_fortran < /tmp/avalassi/input_gqttq_x1_fortran > /tmp/avalassi/output_gqttq_x1_fortran'
 [OPENMPTH] omp_get_max_threads/nproc = 1/4
 [NGOODHEL] ngoodhel/ncomb = 16/32
 [XSECTION] VECSIZE_USED = 8192
 [XSECTION] MultiChannel = TRUE
 [XSECTION] Configuration = 1
 [XSECTION] ChannelId = 1
 [XSECTION] Cross section = 0.2605 [0.26050333309703716] fbridge_mode=0
 [UNWEIGHT] Wrote 81 events (found 540 events)
 [COUNTERS] PROGRAM TOTAL          :    0.3017s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.2234s
 [COUNTERS] Fortran MEs      ( 1 ) :    0.0782s for     8192 events => throughput is 1.05E+05 events/s
...
*** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) ***
--------------------
CUDACPP_RUNTIME_FBRIDGEMODE = (not set)
CUDACPP_RUNTIME_VECSIZEUSED = 8192
--------------------
8192 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
1 ! Channel number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)
--------------------
Executing ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp'
 [OPENMPTH] omp_get_max_threads/nproc = 1/4
 [NGOODHEL] ngoodhel/ncomb = 16/32
 [XSECTION] VECSIZE_USED = 8192
 [XSECTION] MultiChannel = TRUE
 [XSECTION] Configuration = 1
 [XSECTION] ChannelId = 1
 [XSECTION] Cross section = 1.276 [1.2757941949814184] fbridge_mode=1
 [UNWEIGHT] Wrote 105 events (found 652 events)
 [COUNTERS] PROGRAM TOTAL          :    0.3808s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.3106s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.0702s for     8192 events => throughput is 1.17E+05 events/s

*** (2-none) Compare MADEVENT_CPP x1 xsec to MADEVENT_FORTRAN xsec ***

ERROR! xsec from fortran (0.26050333309703716) and cpp (1.2757941949814184) differ by more than 2E-14 (3.8974198518457603)

This is one of the tests I run every time I do a MR.

The log is always here https://github.com/madgraph5/madgraph4gpu/blob/master/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt ie the latest (with errors) being https://github.com/madgraph5/madgraph4gpu/blob/bd255c01fb1cf5377de344c42089765756fd75e1/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt

Specifically: the discrepancy is not a factor two, so I am not sure your findings explain it. Also because I do not think you are comparing the right numbers, see below.

About the number of directpries produced in pp_ttj and gg_ttj, this is an interesting issue but I think it is a known one (and unrelated to gq_ttq cross ssection mismatch). There was one setting which was the default in fortran-without-second-exporter, which however was giving issues in cudacpp (something about mirroring? something about q and q~ better being in separate P1s?... I will try to find the issue number). As a result, Olivier suggested to disable this setiing for cudacpp, and this is (I think) why you see different numbers of P1 directpries.

Anyway, when I see that you are not comparing the right numbers: you are comparing above the fortrran xsec with one seeting for the P1s, against the cpp with a different setting for the P1s. So a facator two may come from that, and might even be normal (instead of one P1 with a xsec 2X, you have two P1s each with xsec X, the sum being 2X anyway?). What I am comparing in the tmad test is instead the cpp with its P1 setting, against the fortran with the same setting as cpp for the P1. In other words: I am generating the cudacpp code, an dthen I am using the fortran that we find inside that same cpp code directory. I have not checked that this is consisten with the fortran in the optimized fortran setting, but I assume that Olivier had done this right an dthere was no need to check.

I will try to find the issue for this doubling of directories now.

valassi commented 1 year ago

(By the way: I think that the difference is essentially using fbridge mode = 0 or 1... actually tmad probably uses two different executables madevent_fortran and madevent_cpp, but I guess that you could also use madevent_cpp in both cases with the options fbridge=0 and1. If you use option fbridge=-1, this will print out event by event comparisons of fortran and cpp MEs, which miught help)

valassi commented 1 year ago

I think (@oliviermattelaer should confirm) that what @roiser has just described above is related to "mirror processes" and to "nprocesses>1". This was discussed in issue #272 and MR #396.

See in particular this figure from issue #272 : https://github.com/madgraph5/madgraph4gpu/issues/272#issuecomment-1491615718 https://user-images.githubusercontent.com/3473550/229080358-ae23dccc-dcca-4b16-a9c4-c7cc4430bf65.png

valassi commented 1 year ago

So, to bring back the focus on gq_ttq: the issue that we should solve is the

ERROR! xsec from fortran (0.26050333309703716) and cpp (1.2757941949814184) differ by more than 2E-14 (3.8974198518457603)

in https://github.com/madgraph5/madgraph4gpu/blob/master/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt

roiser commented 1 year ago

Thanks Andrea, I'm trying to debug this on a finer grained level, actually when comparing the two "fortran" and "cpp" there are differences in cross sections of most of them (but not all), I wonder if this is a valid way of tracing down the bug. I presume we need @oliviermattelaer for this, e.g. I run

SubProcesses (gpucpp)$ for d in `ls | grep P1`; do echo $d | tr -d \\n && cd $d && ./madevent < G1/input_app.txt | grep "Event xsec" && cd ..; done

and reveive for fortran

P1_gg_ttxg Event xsec    7.9291683836808555E-002
P1_gq_ttxq Event xsec   0.52569008216473057     
P1_qq_ttxg Event xsec    9.5002520291213539 

and C++

P1_gg_ttxg Event xsec    8.0151570235421854E-002
P1_gu_ttxu Event xsec   0.21427934258181794     
P1_gux_ttxux Event xsec   0.89817471781675562     
P1_uux_ttxg Event xsec    4.6258210198299379    
valassi commented 1 year ago

Note also that the madX test runs only one of the P1:

    elif [ "${gqttq}" == "1" ]; then 
      dir=$topdir/epochX/${bckend}/gq_ttq${suff}SubProcesses/P1_gu_ttxu # 1st of two (test only one for now)
      ###dir=$topdir/epochX/${bckend}/gq_ttq${suff}SubProcesses/P1_gux_ttxux # 2nd of two (test only one for now)
valassi commented 1 year ago

Thanks Andrea, I'm trying to debug this on a finer grained level, actually when comparing the two "fortran" and "cpp" there are differences in cross sections of most of them (but not all), I wonder if this is a valid way of tracing down the bug. I presume we need @oliviermattelaer for this, e.g. I run

Thanks Stefan. I think that what you are testing at another level is also useful - but for the purpose of debugging this issue #748, I would just focus on the way this bug was identified, no need to do it differently. IMHO you are overcomplicating it if you go your way, but then your choice ;-)

Note also: cross sections depend in very subtle way on which events are used in the sampling. And if you use different settings you most likely end up with different eventgs. Which means you must run the tests at a statistical level, not bit by bit. I would stick to the madX script...

valassi commented 1 year ago

PS

I wonder if this is a valid way of tracing down the bug Rephrasing: no I think this is not a valid way of tracing down this specific bug #748 (then of course I may be wrong... and certainly you may find also interesting things in your tests, but maybe not the most relevant for this one)

roiser commented 1 year ago

I started producing the cross sections for processes which only produce a single sub processes and they show differences with gluon quarks in the initial state.

roiser commented 1 year ago

Thanks again, I think its much better to produce those simpler processes. I can reproduce the error now using the mg5 plugin interface. Just to make sure I also ran a few other processes which look ok always comparing cpp vs fortran version

proc_uux_ttx_cpp      Cross-section :   18.16 +- 0.02996 pb
proc_uux_ttx_for      Cross-section :   18.22 +- 0.04451 pb

proc_gu_ttxu_cpp      Cross-section :   7058 +- 26.17 pb
proc_gu_ttxu_for      Cross-section :   35.61 +- 0.1087 pb

proc_gg_ttx_cpp      Cross-section :   440.6 +- 0.5252 pb
proc_gg_ttx_for      Cross-section :   440.9 +- 0.5376 pb
roiser commented 1 year ago

Quick update on the status of things: as seen above g u > t t~ u does produce wrong results. After a chat with Olivier we concluded that the problem should be on the cppcuda side (alphaS?). In order to prove this I want to compare the "standalone" versions of fortran and cudacpp. The problem now is that "output standalone" produces a compilation error (see below). The vector.inc file is not produced. I need to check in the code generating code.

    The compilation fails with the following output message:
        cd MODEL; make
        make[1]: Entering directory '/afs/cern.ch/work/r/roiser/sw/madgraph4gpu/MG5aMC/mg5amcnlo/guttu2.sa/Source/MODEL'
        /opt/rh/gcc-toolset-11/root/usr/bin/gfortran -w -fPIC  -ffixed-line-length-132  -c -o couplings.o couplings.f
        couplings.f:16: Error: Can't open included file '../vector.inc'
        make[1]: *** [<builtin>: couplings.o] Error 1
        make[1]: Leaving directory '/afs/cern.ch/work/r/roiser/sw/madgraph4gpu/MG5aMC/mg5amcnlo/guttu2.sa/Source/MODEL'
        make: *** [makefile:59: ../lib/libmodel.a] Error 2
roiser commented 1 year ago

A fix for the above issue of the missing vector.inc has been proposed in mg5amcnlo/mg5amcnlo#65

oliviermattelaer commented 1 year ago

I confirm that proc_gu_ttxu_for is 35.74 in the Long Term Stable version of MG5aMC is 35.71 (+-0.10) within gpucpp (with cpp code created but not used --andrea patch are not yet applied--)

zeniheisser commented 1 year ago

@roiser @valassi @oliviermattelaer I just did a quick standalone check of cudacpp vs upstream for the same phase space point

For native MGaMC I get:

 Phase space point:

 -----------------------------------------------------------------------------
 n        E             px             py              pz               m 
 1   0.7500000E+03  0.0000000E+00  0.0000000E+00  0.7500000E+03  0.0000000E+00
 2   0.7500000E+03  0.0000000E+00  0.0000000E+00 -0.7500000E+03  0.0000000E+00
 3   0.1354789E+03  0.1344191E+03 -0.1686881E+02 -0.1209253E+01  0.1364261E+00
 4   0.6441284E+03  0.2809898E+03 -0.3353039E+03  0.4727762E+03  0.3109229E+00
 5   0.7203927E+03 -0.4154090E+03  0.3521728E+03 -0.4715670E+03  0.2978233E+00
 -----------------------------------------------------------------------------
 Matrix element =    4.4143826846910579E-003  GeV^          -2
 -----------------------------------------------------------------------------

whereas for cudacpp I get

--------------------------------------------------------------------------------
Momenta:
   1  7.500000e+02  0.000000e+00  0.000000e+00  7.500000e+02
   2  7.500000e+02  0.000000e+00  0.000000e+00 -7.500000e+02
   3  1.354789e+02  1.344191e+02 -1.686881e+01 -1.209253e+00
   4  6.441284e+02  2.809898e+02 -3.353039e+02  4.727762e+02
   5  7.203927e+02 -4.154090e+02  3.521728e+02 -4.715670e+02
--------------------------------------------------------------------------------
 Matrix element = 0.00203141 GeV^-2
--------------------------------------------------------------------------------

which differ by roughly but not exactly a factor 2. So, there definitely seems to be an issue in the amplitude evaluation, but as said this is only a single phase space point I've checked.

oliviermattelaer commented 1 year ago

Can you check the value of the couplings?

olivier

On 23 Aug 2023, at 14:37, Zenny Wettersten @.***> wrote:

@roiserhttps://github.com/roiser @valassihttps://github.com/valassi @oliviermattelaerhttps://github.com/oliviermattelaer I just did a quick standalone check of cudacpp vs upstream for the same phase space point

For native MGaMC I get:

Phase space point:


n E px py pz m 1 0.7500000E+03 0.0000000E+00 0.0000000E+00 0.7500000E+03 0.0000000E+00 2 0.7500000E+03 0.0000000E+00 0.0000000E+00 -0.7500000E+03 0.0000000E+00 3 0.1354789E+03 0.1344191E+03 -0.1686881E+02 -0.1209253E+01 0.1364261E+00 4 0.6441284E+03 0.2809898E+03 -0.3353039E+03 0.4727762E+03 0.3109229E+00 5 0.7203927E+03 -0.4154090E+03 0.3521728E+03 -0.4715670E+03 0.2978233E+00

Matrix element = 4.4143826846910579E-003 GeV^ -2

whereas for cudacpp I get


Momenta: 1 7.500000e+02 0.000000e+00 0.000000e+00 7.500000e+02 2 7.500000e+02 0.000000e+00 0.000000e+00 -7.500000e+02 3 1.354789e+02 1.344191e+02 -1.686881e+01 -1.209253e+00 4 6.441284e+02 2.809898e+02 -3.353039e+02 4.727762e+02 5 7.203927e+02 -4.154090e+02 3.521728e+02 -4.715670e+02

Matrix element = 0.00203141 GeV^-2

which differ by roughly but not exactly a factor 2. So, there definitely seems to be an issue in the amplitude evaluation, but as said this is only a single phase space point I've checked.

— Reply to this email directly, view it on GitHubhttps://github.com/madgraph5/madgraph4gpu/issues/748#issuecomment-1689886976, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH6535V644PI4PBS2BXB5G3XWX2Q7ANCNFSM6AAAAAA3PPJCUM. You are receiving this because you were mentioned.Message ID: @.***>

roiser commented 1 year ago

Thanks @zeniheisser !! I did the same this morning and saw the same.

Another thing worries me a bit. I thought we are able to produce exactly the same values for Fortran and Cudacpp eg for other processes. Eg I tried „g g > t t-“ the values are close but not the same.

zeniheisser commented 1 year ago

Can you check the value of the couplings?

@roiser can you do this? I could take a quick look at it tomorrow otherwise, but I'm not exactly drowning in free time at the moment hehe

Another thing worries me a bit. I thought we are able to produce exactly the same values for Fortran and Cudacpp eg for other processes. Eg I tried „g g > t t-“ the values are close but not the same.

@roiser, I think you'll have agreement to the level of momentum precision though, no? That's what I have seen when doing tests --- there is a slight difference, but roughly on the order of precision of the momenta I've explicitly set for the processes.

roiser commented 1 year ago

Can you check the value of the couplings? olivier On 23 Aug 2023, at 14:37

I checked now, the G value is the same in fortran and cudacpp

roiser commented 1 year ago

The G value is the same, but when running the debugger I see that in fortran and cudacpp the GC_10 and GC11 are swapped. E.g. I find in fortran

(gdb) print GC_10
$1 = ((-1.2177157847767197,0), ...)
gdb) print GC_11
$2 = ((0,1.2177157847767197), ...)

while in cudacpp the values are for GC_10 (arg of VVV1_0)

(gdb) print COUP
$6 = {m_real = {0, 0, 0, 0}, m_imag = {1.2177157847767195, 1.2177157847767195, 1.2177157847767195, 1.2177157847767195}}

and GC_11 (e.g. arg of FFV1_0)

(gdb) print COUP
$5 = {m_real = {-1.2177157847767195, -1.2177157847767195, -1.2177157847767195, -1.2177157847767195}, m_imag = {0, 0, 0, 0}}

I now swapped the two COUPs entries in the arguments of the ixxx/FFV functions and this renders the same Matrix element value !!! Now we need to find out why only in this case the couplings had been swapped ...

oliviermattelaer commented 1 year ago

Good job!! That's a huge findings (and so easy to miss). Thanks so much.

valassi commented 1 year ago

Thanks @roiser ! Indeed this is fixed by PR #757 . I am unpinning this.

valassi commented 1 year ago

Hi @roiser as discussed in #780: I have the impression that this is again broken, can you please check? I am reopening just in case (close it if it works for you). Stefan note: I am kind of sure this had been fixed by PR #757, but it was broken again by PR #761? Thanks

roiser commented 1 year ago

Hi @valassi thanks, good catch. I quickly checked the cpp vs fortran and indeed the couplings are swapped back in the wrong order for e.g. gu>ttxu. From quickly looking at #761 I realise that the ordering of the couplings following the correct order in wanted_couplings was lost in that patch, so your guess is correct. I'll fix that, should be easy.

roiser commented 1 year ago

Ok, I couldn't wait ;-), there is #782, the trick was to swap the two containers when filtering out the running couplings from wanted_couplings, this preserves the order or the filtered keys. NB: This is WIP, I tested a few processes but will do more checks.

valassi commented 1 year ago

Thanks Stefan! I checked gqttq it looks good, I merged PR #782

roiser commented 1 year ago

I only checked gu>ttxu and dy+2j, let me re-open this one to remind me to do more checks so we can be sure its really fixed.

valassi commented 9 months ago

Hi Stefan, did you manage to run your tests here? Is this confirmed fixed? Thanks

valassi commented 3 months ago

@roiser now that #826 and #862 (also related to couplings order, in susy this time) can this be closed?

roiser commented 3 months ago

Thanks, yes this is all fine, closing it.