Closed valassi closed 3 months ago
I started looking into this. What I’m doing is to create via our new plugin interface the directories for fortran and c++ code with
generate p p > t t~ j
output madevent ppttj_for_sde1 --hel_recycling=False
and
generate p p > t t~ j
output madevent ggttj_cpp_sde1 --hel_recycling=False --vector_size=32 --me_exporter=standalone_cudacpp
respectively. My first observation is that there are different numbers of P1 directories produced i.e.
ls ppttj_for_sde1/SubProcesses/P1_
P1_gg_ttxg/ P1_gq_ttxq/ P1_qq_ttxg/
and
ls ppttj_cpp_sde1/SubProcesses/P1_
P1_gg_ttxg/ P1_gu_ttxu/ P1_gux_ttxux/ P1_uux_ttxg/
Is this related to the sde_stragety
variable in the run_card? Pls note that in both cases this is set to 1. I presume the fortran/P1_gq_ttxq
corresponds to the cpp/P1_gu_ttxu
+ cpp/P1_gux_ttxux
Going a bit further I then produce the cross sections for fortran/P1_gq_ttxq and cpp/P1_gu_ttxu, cpp/P1_gux_ttxux where I end up with numbers e.g.
fortran/P1_gq_ttxq : Event xsec 0.52569008216473057
and
cpp/P1_gu_ttxu : Event xsec 0.21427934258181794
cpp/P1_gux_ttxux : Event xsec 0.89817471781675562
can I sum up the two cpp ones to the fortran one? They don’t … there is a factor 2 between them.
Before going further I wanted to see if I can reproduce the same numbers in fortran with the two P1 directories generated. Can this generation of the two P1 directories be forced?
Hi @roiser thanks for looking into this. (cc @oliviermattelaer @hageboeck @zeniheisser @Jooorgen for info on the tests)
Apologies, I should have posted the details here of where I see the errors. There is a script already-made where you get everything automatically. For instance with current upstream/master:
cd epochX/cudacpp
./tmad/teeMadX.sh -gqttq +10x -makeclean
...
*** (1) EXECUTE MADEVENT_FORTRAN x1 (create events.lhe) ***
--------------------
CUDACPP_RUNTIME_FBRIDGEMODE = (not set)
CUDACPP_RUNTIME_VECSIZEUSED = 8192
--------------------
8192 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
1 ! Channel number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)
--------------------
Executing ' ./madevent_fortran < /tmp/avalassi/input_gqttq_x1_fortran > /tmp/avalassi/output_gqttq_x1_fortran'
[OPENMPTH] omp_get_max_threads/nproc = 1/4
[NGOODHEL] ngoodhel/ncomb = 16/32
[XSECTION] VECSIZE_USED = 8192
[XSECTION] MultiChannel = TRUE
[XSECTION] Configuration = 1
[XSECTION] ChannelId = 1
[XSECTION] Cross section = 0.2605 [0.26050333309703716] fbridge_mode=0
[UNWEIGHT] Wrote 81 events (found 540 events)
[COUNTERS] PROGRAM TOTAL : 0.3017s
[COUNTERS] Fortran Overhead ( 0 ) : 0.2234s
[COUNTERS] Fortran MEs ( 1 ) : 0.0782s for 8192 events => throughput is 1.05E+05 events/s
...
*** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) ***
--------------------
CUDACPP_RUNTIME_FBRIDGEMODE = (not set)
CUDACPP_RUNTIME_VECSIZEUSED = 8192
--------------------
8192 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
1 ! Channel number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)
--------------------
Executing ' ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp'
[OPENMPTH] omp_get_max_threads/nproc = 1/4
[NGOODHEL] ngoodhel/ncomb = 16/32
[XSECTION] VECSIZE_USED = 8192
[XSECTION] MultiChannel = TRUE
[XSECTION] Configuration = 1
[XSECTION] ChannelId = 1
[XSECTION] Cross section = 1.276 [1.2757941949814184] fbridge_mode=1
[UNWEIGHT] Wrote 105 events (found 652 events)
[COUNTERS] PROGRAM TOTAL : 0.3808s
[COUNTERS] Fortran Overhead ( 0 ) : 0.3106s
[COUNTERS] CudaCpp MEs ( 2 ) : 0.0702s for 8192 events => throughput is 1.17E+05 events/s
*** (2-none) Compare MADEVENT_CPP x1 xsec to MADEVENT_FORTRAN xsec ***
ERROR! xsec from fortran (0.26050333309703716) and cpp (1.2757941949814184) differ by more than 2E-14 (3.8974198518457603)
This is one of the tests I run every time I do a MR.
The log is always here https://github.com/madgraph5/madgraph4gpu/blob/master/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt ie the latest (with errors) being https://github.com/madgraph5/madgraph4gpu/blob/bd255c01fb1cf5377de344c42089765756fd75e1/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
Specifically: the discrepancy is not a factor two, so I am not sure your findings explain it. Also because I do not think you are comparing the right numbers, see below.
About the number of directpries produced in pp_ttj and gg_ttj, this is an interesting issue but I think it is a known one (and unrelated to gq_ttq cross ssection mismatch). There was one setting which was the default in fortran-without-second-exporter, which however was giving issues in cudacpp (something about mirroring? something about q and q~ better being in separate P1s?... I will try to find the issue number). As a result, Olivier suggested to disable this setiing for cudacpp, and this is (I think) why you see different numbers of P1 directpries.
Anyway, when I see that you are not comparing the right numbers: you are comparing above the fortrran xsec with one seeting for the P1s, against the cpp with a different setting for the P1s. So a facator two may come from that, and might even be normal (instead of one P1 with a xsec 2X, you have two P1s each with xsec X, the sum being 2X anyway?). What I am comparing in the tmad test is instead the cpp with its P1 setting, against the fortran with the same setting as cpp for the P1. In other words: I am generating the cudacpp code, an dthen I am using the fortran that we find inside that same cpp code directory. I have not checked that this is consisten with the fortran in the optimized fortran setting, but I assume that Olivier had done this right an dthere was no need to check.
I will try to find the issue for this doubling of directories now.
(By the way: I think that the difference is essentially using fbridge mode = 0 or 1... actually tmad probably uses two different executables madevent_fortran and madevent_cpp, but I guess that you could also use madevent_cpp in both cases with the options fbridge=0 and1. If you use option fbridge=-1, this will print out event by event comparisons of fortran and cpp MEs, which miught help)
I think (@oliviermattelaer should confirm) that what @roiser has just described above is related to "mirror processes" and to "nprocesses>1". This was discussed in issue #272 and MR #396.
See in particular this figure from issue #272 : https://github.com/madgraph5/madgraph4gpu/issues/272#issuecomment-1491615718 https://user-images.githubusercontent.com/3473550/229080358-ae23dccc-dcca-4b16-a9c4-c7cc4430bf65.png
So, to bring back the focus on gq_ttq: the issue that we should solve is the
ERROR! xsec from fortran (0.26050333309703716) and cpp (1.2757941949814184) differ by more than 2E-14 (3.8974198518457603)
Thanks Andrea, I'm trying to debug this on a finer grained level, actually when comparing the two "fortran" and "cpp" there are differences in cross sections of most of them (but not all), I wonder if this is a valid way of tracing down the bug. I presume we need @oliviermattelaer for this, e.g. I run
SubProcesses (gpucpp)$ for d in `ls | grep P1`; do echo $d | tr -d \\n && cd $d && ./madevent < G1/input_app.txt | grep "Event xsec" && cd ..; done
and reveive for fortran
P1_gg_ttxg Event xsec 7.9291683836808555E-002
P1_gq_ttxq Event xsec 0.52569008216473057
P1_qq_ttxg Event xsec 9.5002520291213539
and C++
P1_gg_ttxg Event xsec 8.0151570235421854E-002
P1_gu_ttxu Event xsec 0.21427934258181794
P1_gux_ttxux Event xsec 0.89817471781675562
P1_uux_ttxg Event xsec 4.6258210198299379
Note also that the madX test runs only one of the P1:
elif [ "${gqttq}" == "1" ]; then
dir=$topdir/epochX/${bckend}/gq_ttq${suff}SubProcesses/P1_gu_ttxu # 1st of two (test only one for now)
###dir=$topdir/epochX/${bckend}/gq_ttq${suff}SubProcesses/P1_gux_ttxux # 2nd of two (test only one for now)
Thanks Andrea, I'm trying to debug this on a finer grained level, actually when comparing the two "fortran" and "cpp" there are differences in cross sections of most of them (but not all), I wonder if this is a valid way of tracing down the bug. I presume we need @oliviermattelaer for this, e.g. I run
Thanks Stefan. I think that what you are testing at another level is also useful - but for the purpose of debugging this issue #748, I would just focus on the way this bug was identified, no need to do it differently. IMHO you are overcomplicating it if you go your way, but then your choice ;-)
Note also: cross sections depend in very subtle way on which events are used in the sampling. And if you use different settings you most likely end up with different eventgs. Which means you must run the tests at a statistical level, not bit by bit. I would stick to the madX script...
PS
I wonder if this is a valid way of tracing down the bug Rephrasing: no I think this is not a valid way of tracing down this specific bug #748 (then of course I may be wrong... and certainly you may find also interesting things in your tests, but maybe not the most relevant for this one)
I started producing the cross sections for processes which only produce a single sub processes and they show differences with gluon quarks in the initial state.
Thanks again, I think its much better to produce those simpler processes. I can reproduce the error now using the mg5 plugin interface. Just to make sure I also ran a few other processes which look ok always comparing cpp
vs for
tran version
proc_uux_ttx_cpp Cross-section : 18.16 +- 0.02996 pb
proc_uux_ttx_for Cross-section : 18.22 +- 0.04451 pb
proc_gu_ttxu_cpp Cross-section : 7058 +- 26.17 pb
proc_gu_ttxu_for Cross-section : 35.61 +- 0.1087 pb
proc_gg_ttx_cpp Cross-section : 440.6 +- 0.5252 pb
proc_gg_ttx_for Cross-section : 440.9 +- 0.5376 pb
Quick update on the status of things: as seen above g u > t t~ u does produce wrong results. After a chat with Olivier we concluded that the problem should be on the cppcuda side (alphaS?). In order to prove this I want to compare the "standalone" versions of fortran and cudacpp. The problem now is that "output standalone" produces a compilation error (see below). The vector.inc file is not produced. I need to check in the code generating code.
The compilation fails with the following output message:
cd MODEL; make
make[1]: Entering directory '/afs/cern.ch/work/r/roiser/sw/madgraph4gpu/MG5aMC/mg5amcnlo/guttu2.sa/Source/MODEL'
/opt/rh/gcc-toolset-11/root/usr/bin/gfortran -w -fPIC -ffixed-line-length-132 -c -o couplings.o couplings.f
couplings.f:16: Error: Can't open included file '../vector.inc'
make[1]: *** [<builtin>: couplings.o] Error 1
make[1]: Leaving directory '/afs/cern.ch/work/r/roiser/sw/madgraph4gpu/MG5aMC/mg5amcnlo/guttu2.sa/Source/MODEL'
make: *** [makefile:59: ../lib/libmodel.a] Error 2
A fix for the above issue of the missing vector.inc has been proposed in mg5amcnlo/mg5amcnlo#65
I confirm that proc_gu_ttxu_for is 35.74 in the Long Term Stable version of MG5aMC is 35.71 (+-0.10) within gpucpp (with cpp code created but not used --andrea patch are not yet applied--)
@roiser @valassi @oliviermattelaer I just did a quick standalone check of cudacpp vs upstream for the same phase space point
For native MGaMC I get:
Phase space point:
-----------------------------------------------------------------------------
n E px py pz m
1 0.7500000E+03 0.0000000E+00 0.0000000E+00 0.7500000E+03 0.0000000E+00
2 0.7500000E+03 0.0000000E+00 0.0000000E+00 -0.7500000E+03 0.0000000E+00
3 0.1354789E+03 0.1344191E+03 -0.1686881E+02 -0.1209253E+01 0.1364261E+00
4 0.6441284E+03 0.2809898E+03 -0.3353039E+03 0.4727762E+03 0.3109229E+00
5 0.7203927E+03 -0.4154090E+03 0.3521728E+03 -0.4715670E+03 0.2978233E+00
-----------------------------------------------------------------------------
Matrix element = 4.4143826846910579E-003 GeV^ -2
-----------------------------------------------------------------------------
whereas for cudacpp I get
--------------------------------------------------------------------------------
Momenta:
1 7.500000e+02 0.000000e+00 0.000000e+00 7.500000e+02
2 7.500000e+02 0.000000e+00 0.000000e+00 -7.500000e+02
3 1.354789e+02 1.344191e+02 -1.686881e+01 -1.209253e+00
4 6.441284e+02 2.809898e+02 -3.353039e+02 4.727762e+02
5 7.203927e+02 -4.154090e+02 3.521728e+02 -4.715670e+02
--------------------------------------------------------------------------------
Matrix element = 0.00203141 GeV^-2
--------------------------------------------------------------------------------
which differ by roughly but not exactly a factor 2. So, there definitely seems to be an issue in the amplitude evaluation, but as said this is only a single phase space point I've checked.
Can you check the value of the couplings?
olivier
On 23 Aug 2023, at 14:37, Zenny Wettersten @.***> wrote:
@roiserhttps://github.com/roiser @valassihttps://github.com/valassi @oliviermattelaerhttps://github.com/oliviermattelaer I just did a quick standalone check of cudacpp vs upstream for the same phase space point
For native MGaMC I get:
Phase space point:
whereas for cudacpp I get
which differ by roughly but not exactly a factor 2. So, there definitely seems to be an issue in the amplitude evaluation, but as said this is only a single phase space point I've checked.
— Reply to this email directly, view it on GitHubhttps://github.com/madgraph5/madgraph4gpu/issues/748#issuecomment-1689886976, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH6535V644PI4PBS2BXB5G3XWX2Q7ANCNFSM6AAAAAA3PPJCUM. You are receiving this because you were mentioned.Message ID: @.***>
Thanks @zeniheisser !! I did the same this morning and saw the same.
Another thing worries me a bit. I thought we are able to produce exactly the same values for Fortran and Cudacpp eg for other processes. Eg I tried „g g > t t-“ the values are close but not the same.
Can you check the value of the couplings?
@roiser can you do this? I could take a quick look at it tomorrow otherwise, but I'm not exactly drowning in free time at the moment hehe
Another thing worries me a bit. I thought we are able to produce exactly the same values for Fortran and Cudacpp eg for other processes. Eg I tried „g g > t t-“ the values are close but not the same.
@roiser, I think you'll have agreement to the level of momentum precision though, no? That's what I have seen when doing tests --- there is a slight difference, but roughly on the order of precision of the momenta I've explicitly set for the processes.
Can you check the value of the couplings? olivier On 23 Aug 2023, at 14:37
I checked now, the G value is the same in fortran and cudacpp
The G value is the same, but when running the debugger I see that in fortran and cudacpp the GC_10 and GC11 are swapped. E.g. I find in fortran
(gdb) print GC_10
$1 = ((-1.2177157847767197,0), ...)
gdb) print GC_11
$2 = ((0,1.2177157847767197), ...)
while in cudacpp the values are for GC_10
(arg of VVV1_0)
(gdb) print COUP
$6 = {m_real = {0, 0, 0, 0}, m_imag = {1.2177157847767195, 1.2177157847767195, 1.2177157847767195, 1.2177157847767195}}
and GC_11
(e.g. arg of FFV1_0)
(gdb) print COUP
$5 = {m_real = {-1.2177157847767195, -1.2177157847767195, -1.2177157847767195, -1.2177157847767195}, m_imag = {0, 0, 0, 0}}
I now swapped the two COUPs entries in the arguments of the ixxx/FFV functions and this renders the same Matrix element value !!! Now we need to find out why only in this case the couplings had been swapped ...
Good job!! That's a huge findings (and so easy to miss). Thanks so much.
Thanks @roiser ! Indeed this is fixed by PR #757 . I am unpinning this.
Hi @roiser as discussed in #780: I have the impression that this is again broken, can you please check? I am reopening just in case (close it if it works for you). Stefan note: I am kind of sure this had been fixed by PR #757, but it was broken again by PR #761? Thanks
Hi @valassi thanks, good catch. I quickly checked the cpp vs fortran and indeed the couplings are swapped back in the wrong order for e.g. gu>ttxu. From quickly looking at #761 I realise that the ordering of the couplings following the correct order in wanted_couplings
was lost in that patch, so your guess is correct. I'll fix that, should be easy.
Ok, I couldn't wait ;-), there is #782, the trick was to swap the two containers when filtering out the running couplings from wanted_couplings, this preserves the order or the filtered keys. NB: This is WIP, I tested a few processes but will do more checks.
Thanks Stefan! I checked gqttq it looks good, I merged PR #782
I only checked gu>ttxu and dy+2j, let me re-open this one to remind me to do more checks so we can be sure its really fixed.
Hi Stefan, did you manage to run your tests here? Is this confirmed fixed? Thanks
@roiser now that #826 and #862 (also related to couplings order, in susy this time) can this be closed?
Thanks, yes this is all fine, closing it.
This is a follow-up to #630, which had originally been files about gq_ttq but eventually focused on gg_uu: xsec from fortran and cpp differ in gq_ttq tmad tests.
In #630, I had also found out that xsec from fortran and cpp also differed in gg_uu tmad tests, but with Zenny's help this was identified as a problem in getGoodHel and fixed. This however fixed gg_uu, but not gq_ttq, where xsecs are still different. For clarity, I move here gq_ttq, while I will soon close #630 about gg_uu.