Closed valassi closed 1 week ago
In https://github.com/madgraph5/madgraph4gpu/pull/852#issuecomment-2143014502 Olivier suggested "you/we should compile with the C equivalent of -fbounds-check which is super usefull to spot segfault who by definition are hardware specific". I had a look but I am not sure there is an equivalent.
Instead I have run valgrind, this is interesting. This is a reproducer which mimics the tmad test above, but without using tmad tests
cd gg_ttgg.mad/SubProcesses/P1_gg_ttxgg
make cleanall
make -j BACKEND=cppnone -f cudacpp.mk debug
make -j BACKEND=cppnone
cat > input_cudacpp_104 << EOF
8192 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
104 ! Channel number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)
EOF
./madevent_cpp < input_cudacpp_104
valgrind ./madevent_cpp < input_cudacpp_104
The valgrind output includes things like
...
==794089== Conditional jump or move depends on uninitialised value(s)
==794089== at 0x426F03: setclscales_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x429569: update_scale_coupling_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x438857: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089==
==794089== Warning: client switching stacks? SP change: 0x1ffeffeeb8 --> 0x1ffec3eb80
==794089== to suppress, use: --max-stackframe=3932984 or greater
==794089== Invalid write of size 8
==794089== at 0x4366D4: dsig1_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x437C97: dsigproc_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x4388A7: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== Address 0x1ffec3eba8 is on thread 1's stack
==794089== in frame #0, created by dsig1_vec_ (???:)
==794089==
==794089== Invalid write of size 8
==794089== at 0x4366D9: dsig1_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x437C97: dsigproc_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x4388A7: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== Address 0x1ffec3ebb0 is on thread 1's stack
==794089== in frame #0, created by dsig1_vec_ (???:)
...
==794089== Invalid read of size 4
==794089== at 0x436AE5: dsig1_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x437C97: dsigproc_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x4388A7: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== Address 0x1ffec3ebcc is on thread 1's stack
==794089== in frame #0, created by dsig1_vec_ (???:)
...
==794089== Invalid read of size 8
==794089== at 0x6E032EF: memmove (vg_replace_strmem.c:1385)
==794089== by 0x6E6D811: mg5amcCpu::Bridge<double>::cpu_sequence(double const*, double const*, double const*, double const*, unsigned int, double*, int*, int*, bool) (Bridge.h:376)
==794089== by 0x6E6F37B: fbridgesequence_ (fbridge.cc:106)
==794089== by 0x6E6F3F2: fbridgesequence_nomultichannel_ (fbridge.cc:132)
==794089== by 0x4358D9: smatrix1_multi_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x436C74: dsig1_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x437C97: dsigproc_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x4388A7: dsig_vec_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x45CC7A: sample_full_ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x434269: MAIN__ (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== by 0x40371E: main (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/madevent_cpp)
==794089== Address 0x1ffec7eec8 is on thread 1's stack
==794089== in frame #5, created by dsig1_vec_ (???:)
...
Also I have rebuilt with -O3 -g in make_opts:
epochX/cudacpp/gg_ttgg.mad/Source/make_opts /tmp/git-blob-ieuRtt/make_opts e4b87ee6ad40ecb97ecbb40ae1811714ce5f1b46 100644 epochX/cudacpp/gg_ttgg.mad/Source/make_opts 0000000000000000000000000000000000000000 100644
4c4,5
< GLOBAL_FLAG=-O3 -ffast-math -fbounds-check
---
> ###GLOBAL_FLAG=-O3 -ffast-math -fbounds-check
> GLOBAL_FLAG=-O3 -g -ffast-math -fbounds-check
The crash now prints out where it happens, it is in rotxxx
Setting grid 1 0.17709E-03 1
Setting grid 2 0.17709E-03 1
Setting grid 3 0.22041E-03 1
Transforming s_hat 1/s 9 8.8163313609467475E-004 119716.00000000000 168999999.99999997
Error opening symfact.dat. No permutations used.
Using random seed offsets 104 : 1
with seed 21
Ranmar initialization seeds 27505 9395
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x7f6471c23860 in ???
#1 0x7f6471c22a05 in ???
#2 0x7f6471854def in ???
#3 0x44b5ff in rotxxx_
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/DHELAS/aloha_functions.f:1247
#4 0x4087df in gentcms_
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1480
#5 0x409848 in one_tree_
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:1167
#6 0x40bb83 in gen_mom_
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:68
#7 0x40d1a9 in x_to_f_arg_
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/genps.f:60
#8 0x45c804 in sample_full_
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/Source/dsample.f:172
#9 0x434269 in driver
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:256
#10 0x40371e in main
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/driver.f:301
Floating point exception (core dumped)
Note, rotxxx is what I had already foun dalso in susy tests https://github.com/madgraph5/madgraph4gpu/issues/826#issuecomment-2139578630
As discussed in #826 this is again a weird optimization issue: gdb gives
Program received signal SIGFPE, Arithmetic exception.
rotxxx (p=..., q=..., prot=...) at aloha_functions.f:1247
1247 prot(1) = q(1)*q(3)/qq/qt*p1 -q(2)/qt*p(2) +q(1)/qq*p(3)
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.x86_64 libgcc-11.3.1-4.3.el9.alma.x86_64 libgfortran-11.3.1-4.3.el9.alma.x86_64 libgomp-11.3.1-4.3.el9.alma.x86_64 libquadmath-11.3.1-4.3.el9.alma.x86_64 libstdc++-11.3.1-4.3.el9.alma.x86_64
(gdb) p qq qt p1
A syntax error in expression, near `qt p1'.
(gdb) p qq
$1 = <optimized out>
(gdb) p qt
$2 = <optimized out>
(gdb) p p1
$3 = <optimized out>
This was with -O3 -g. If I use lower optimization levels, the issue disappears.
As I have done withy many SIGFPEs in cudacpp, I tried adding volatile
--- a/epochX/cudacpp/gg_ttgg.mad/Source/DHELAS/aloha_functions.f
+++ b/epochX/cudacpp/gg_ttgg.mad/Source/DHELAS/aloha_functions.f
@@ -1201,7 +1201,7 @@ c real prot(0:3) : four-momentum p in the rotated frame
c
implicit none
double precision p(0:3),q(0:3),prot(0:3),qt2,qt,psgn,qq,p1
-
+ volatile qt, p1, qq
double precision rZero, rOne
parameter( rZero = 0.0d0, rOne = 1.0d0 )
Strangely enough. this prevents SIGFPE. But now the code seems stuck in an infinite loop?
I tried cuda to make it faster.
Again something strange, the code crashes without valgrind but does not crash with valgrind... (NB this is WITHOUT volatile)
cd gg_ttgg.mad/SubProcesses/P1_gg_ttxgg
make cleanall
make -j BACKEND=cuda -f cudacpp.mk debug
make -j BACKEND=cuda
cat > input_cudacpp_104 << EOF
8192 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
104 ! Channel number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)
EOF
./madevent_cuda < input_cudacpp_104
valgrind ./madevent_cuda < input_cudacpp_104
Ok. In the cuda version, adding volatile in the Fortran removes SIGFPE and allows the program to reach the end.
So IS THIS A POSSIBLE FIX?
With cpp maybe I just needed to wait? Or is this going slower? I will try to rerun more tests and leave them running.
(In the meantime I will also try the susy_gg_t1t1 channel which in the past seemed problematic with SIGFPE).
This is fixed by #857 by adding volatile, as I had done for similar SIGFPE in cudacpp
I completed my tests in PR #857 and I confirm that it fixes this issue, closing
Reopening until PR #857 is merged - or until this is otherwise clarified
I change the name of this to indicate that this is ONLY about rotxxx crashes. This can be fixed using 'volatile' in PR #857 and https://github.com/mg5amcnlo/mg5amcnlo/pull/113
Conversely I removed "channel/iconfig mapping issues" from the name of this issue. Those "channel/iconfig mapping issues" are behind the LHE mismatch #856 and possibly the intermittent sigmakin crash #845.
Note, there is a crash #885 in master_june40 that I thought was related to this, but it most likely is unrelated (and is instead speciufic to master_june40)
"tmad test crashes for some iconfig (channel/iconfig mapping issues and SIGFPE erroneous arithmetic operation)"
Hi @oliviermattelaer this is a follow up to the discussions in #826 and PR #853.
I prefer to open this as a clean issue and investigate this independently of SUSY, or in any case of zero cross section #826.
In these discussions from your patch #853 I realised that we risk having a MAJOR problem not only for BSM but also for SM, namely: all of my 'tmad' tests test only iconfig=1. These were ok so far (in some cases by luck maybe), but for different iconfig (i.e. if we put a number different from 1 in the input_app.txt piped to madevent.
Indeed I found a crash on the first test I executed, ggttgg with iconfig=104.
This uses a sightly modified script, I will pur it in a PR.
I guess that the solution goes through what you proposed in #852 and the additional modifications you and I discussed there.
(Note: the 'tlau' tests that I proposed in July last year just before my absence were supposed to test exactly this (see #711), i.e. test all possible iconfig at the same time in a user-like enviornment, for all processes, but using a short manageable time. I continue to think that allowing the possibility to run shorter generate_events tests is necessary to allow better testing. There was disagreement last year, I hope we can come back and agree on this).