madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
29 stars 33 forks source link

master_june24: hack in counters.cc hides a real issue in auto_dsig1.f #891

Closed valassi closed 1 month ago

valassi commented 1 month ago

Another issue introduced in #830 and being reviewed in #882.

There was a HACK introduced that was not necessary in master. This should be understood and a proper fix introduced.

Before master_june24 there was no need to implement this 'hack'. Are counters being properly taken care of or this hack hiding underlying issues?

git show fe467b559053fd6caece5494a3c2a2eccf2ebaf8 CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/counters.cc
commit fe467b559053fd6caece5494a3c2a2eccf2ebaf8
Author: Stefan Roiser <stefan.roiser@cern.ch>
Date:   Wed May 29 10:35:12 2024 +0200

    HACK, avoid division by 0

diff --git a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/counters.cc b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACP>
index 742575a6a..a445f0a02 100644
--- a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/counters.cc
+++ b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/counters.cc
@@ -92,7 +92,7 @@ extern "C"
                 iimplC + 1,
                 smatrix1multi_totaltime[iimplC],
                 smatrix1multi_counter[iimplC],
-                smatrix1multi_counter[iimplC] / smatrix1multi_totaltime[iimplC] );
+                smatrix1multi_counter[iimplC] / ( smatrix1multi_totaltime[iimplC] + .1 ));
     return;
   }
 }
oliviermattelaer commented 1 month ago

Hi,

@valassi, I would say that this is low priority. Obviously a deeper understanding of such issue is always good and clear our mind, so I would say that this is not a good reason to block the PR. But if you have time to have a look (keeping in mind that this is issue might be hardware specific and therefore might actually also be needed in master/...)

So up to you.

Olivier

valassi commented 1 month ago

Hi,

@valassi, I would say that this is low priority. Obviously a deeper understanding of such issue is always good and clear our mind, so I would say that this is not a good reason to block the PR. But if you have time to have a look (keeping in mind that this is issue might be hardware specific and therefore might actually also be needed in master/...)

So up to you.

Olivier

Hi Olivier thanks :-)

Yes I agree this I will keep for after the merge of PR #882.

I just tried to remov ethe hack and it crashes again, so there is something to understand, see https://github.com/valassi/madgraph4gpu/commit/2b57ee5e22e55ee0933b241142c5bf63df5f3422

Keeping this open

valassi commented 1 month ago

I guess this is related

- [XSECTION] Cross section = 7.667e-07 [7.6668139178203571E-007] fbridge_mode=1
- [UNWEIGHT] Wrote 1700 events (found 1705 events)
- [COUNTERS] PROGRAM TOTAL          :    2.6549s
- [COUNTERS] Fortran Overhead ( 0 ) :    1.2980s
- [COUNTERS] CudaCpp MEs      ( 2 ) :    1.3569s for    90112 events => throughput is 6.64E+04 events/s
+ [XSECTION] Cross section = 7.616e-07 [7.6161383186590445E-007] fbridge_mode=1
+ [UNWEIGHT] Wrote 1833 events (found 1838 events)
+ [COUNTERS] PROGRAM TOTAL          :    2.6924s
+ [COUNTERS] Fortran Overhead ( 0 ) :    1.3124s
+ [COUNTERS] Fortran MEs      ( 1 ) :    0.0000s for    90112 events => throughput is 9.01E+05 events/s
+ [COUNTERS] CudaCpp MEs      ( 2 ) :    1.3799s for    90112 events => throughput is 6.09E+04 events/s

The Fortran MEs line with 0s should not appear...

valassi commented 1 month ago

This is obviously related. I guess the generated code was not cross-checked, there are two calls to start counters

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx> git log --oneline -n1
23585b908 (HEAD -> june24, origin/june24) [june24] regenerate all processes - this propagates new reference logs to all .sa 
[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx> git diff --no-ext-diff upstream/master auto_dsig1.f
...
-      
-      IF( FBRIDGE_MODE .LE. 0 ) THEN ! (FortranOnly=0 or BothQuiet=-1 or BothDebug=-2)
+#else
+      INTEGER FBRIDGE_MODE
 #endif
-        call counters_smatrix1multi_start( -1, VECSIZE_USED ) ! fortran=-1
-!$OMP PARALLEL
-!$OMP DO
+      CALL COUNTERS_SMATRIX1MULTI_START( -1, VECSIZE_USED )  ! fortran=-1
+
+
+
+      IF( FBRIDGE_MODE .LE. 0 ) THEN  ! (FortranOnly=0 or BothQuiet=-1 or BothDebug=-2)
+        CALL COUNTERS_SMATRIX1MULTI_START( -1, VECSIZE_USED )  ! fortran=-1
+
...
valassi commented 1 month ago

Since I noticed the wrong printout, it did not take me long to fix the real issue.

The problem was in auto_dsig1.f, there were two calls to starting counters. I fixed this.

This was most likely realted to the attempts to simplify patch.P1. I have restructured a bit the generated code and output.py, now auto_dsig1.f has disappeared from patch.P1 (which I guess is what was the intention of introducing CODEGEN smatrix_multi.f as a templated).

Will be fixed in #882. Closing