madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
29 stars 33 forks source link

LHE file mismatch between fortran and cpp in heft_gg_bb for FPTYPE=f #833

Open valassi opened 3 months ago

valassi commented 3 months ago

I have finally run the first tmad test for a HEFT process, heft_gg_bb. This is in WIP PR #832.

All tests succeed for double and mixed precision. There is a mismatch of LHE files in float precision

*** (2-none) Compare MADEVENT_CPP x1 events.lhe to MADEVENT_FORTRAN events.lhe reference (including colors and helicities) ***
ERROR! events.lhe.cpp.1 and events.lhe.ref.1 differ!
diff /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/heft_gg_bb.mad/SubProcesses/P1_gg_bbx/events.lhe.cpp.1 /data/avalassi/GPU2023/madgraph4gpuX/epo>
6206,6207c6206,6207
<          21   -1    0    0  502  503 -0.00000000000E+00 -0.00000000000E+00 -0.59936081260E+01  0.59936081260E+01  0.00000000000E+00 0. -1.
<           5    1    1    2  501    0  0.45273385612E+02 -0.31131305296E+02  0.47763304676E+03  0.48080583916E+03  0.47000000000E+01 0.  1.
---
>          21   -1    0    0  502  503 -0.00000000000E+00 -0.00000000000E+00 -0.59936081260E+01  0.59936081260E+01  0.00000000000E+00 0.  1.
>           5    1    1    2  501    0  0.45273385612E+02 -0.31131305296E+02  0.47763304676E+03  0.48080583916E+03  0.47000000000E+01 0. -1.
8306,8307c8306,8307
<          21   -1    0    0  502  503 -0.00000000000E+00 -0.00000000000E+00 -0.23857997239E+02  0.23857997239E+02  0.00000000000E+00 0.  1.
<           5    1    1    2  501    0 -0.34843521722E+02  0.35239303629E+02  0.13219496682E+02  0.51504607743E+02  0.47000000000E+01 0. -1.
---
>          21   -1    0    0  502  503 -0.00000000000E+00 -0.00000000000E+00 -0.23857997239E+02  0.23857997239E+02  0.00000000000E+00 0. -1.
>           5    1    1    2  501    0 -0.34843521722E+02  0.35239303629E+02  0.13219496682E+02  0.51504607743E+02  0.47000000000E+01 0.  1.
9606,9619d9605
< 4 1 1E-03 0.1250139E+03 0.7546771E-02 0.1235066E+00
<          21   -1    0    0  503  502  0.00000000000E+00  0.00000000000E+00  0.94948250004E+03  0.94948250004E+03  0.00000000000E+00 0.  1.
<          21   -1    0    0  502  503 -0.00000000000E+00 -0.00000000000E+00 -0.41149990002E+01  0.41149990002E+01  0.00000000000E+00 0. -1.
<           5    1    1    2  501    0 -0.96459450317E+01 -0.34409175043E+02  0.83136584965E+02  0.90613560477E+02  0.47000000000E+01 0. -1.
<          -5    1    1    2    0  501  0.96459450317E+01  0.34409175043E+02  0.86223091608E+03  0.86298393857E+03  0.47000000000E+01 0.  1.
< <mgrwt>
< <rscale>  0 0.12501391E+03</rscale>
valassi commented 3 months ago

This is strange. It is not a systematic problem. Most events are the same. There are just a few events where the helicities are mismatched, and a few events which are passed in c++ but do not exist for fortran.

valassi commented 3 months ago

Just to be sure, I commented out the hack to flush to zero small jamp in #831. I get the same issues (and no FPEs).

valassi commented 3 months ago

A quick idea about how to investigate this: while it is difficult to debug the fortran in double precision, I know that cudacpp in double precision agrees with that. It may be best to add debug printouts to cudacpp and then run the same test in double and single precision to understand the differences.

Note also: from debugging #831 it is obvious that there are huge cancellations all over the place in this HEFT gg_bb MIW<=1 process. It is not so surprising that float gives different results, maybe.

Maybe we should just decalre that for some processes like this one, double precision (not even moxed) is required. But it would be best to understand how to decide which processes need double precision...