madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 33 forks source link

Fixes for iconfig-channel mappings in coloramps.h (obsolete, will be closed) #873

Closed valassi closed 3 months ago

valassi commented 3 months ago

This is a clean WIP MR for iconfig-color mapping issues in coloramps.h. It starts from the current upstream/master, which already includes all relevant changes for rotxxx crashes.

It includes all of Olivier's commits in PR #852, as well as the relevant commits from my PR #853. It also supersedes the tests in WIP PR #870.

It is meant to address #856 (lhe color mismatches in ggttgg) and related issues, like possibly #845.

It is NOT meant to address #826 (no cross section in susy) and has most likely nothing to do with 826 (or susy), even if it includes fragments from 'fix_826" and 'gpucpp_826' and 'susy' branches.

valassi commented 3 months ago

I have slightly modified the formatting (sorry Olivier, I went back to my notation as otherwise I don't understand all these numbers), but kept the functional implementation by Olivier.

The test for #856 in ggttgg still fails (not surprisingly, as I have not toched anything).

Now looking at the code to debug the functional issue.

There is one point that is puzzling me, the false/true sequence is different in the fortran and c icolamp.

In cudacpp coloramps.h https://github.com/valassi/madgraph4gpu/blob/0481effb663d349ddd1bb7ade151e8c025d37fe5/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/coloramps.h#L146

  __device__ constexpr bool icolamp[105][24] = { // note: a trailing comma in the initializer list is allowed
    {  true,  true, false, false, false, false,  true,  true, false, false, false, false,  true, false,  true, false,  true,  true,  true, false,  true, false,  true,  true }, // ICONFIG=1   <-- CHANNEL_ID=2
    {  true, false, false, false, false, false,  true, false, false, false, false, false,  true, false,  true, false, false, false,  true, false,  true, false,  true,  true }, // ICONFIG=2   <-- CHANNEL_ID=3

In fortran coloramps.inc https://github.com/valassi/madgraph4gpu/blob/0481effb663d349ddd1bb7ade151e8c025d37fe5/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/coloramps.inc#L1

      LOGICAL ICOLAMP(24,105,1)
      DATA(ICOLAMP(I,1,1),I=1,24)/.TRUE.,.FALSE.,.FALSE.,.FALSE.
     $ ,.FALSE.,.FALSE.,.TRUE.,.FALSE.,.FALSE.,.FALSE.,.FALSE.,.FALSE.
     $ ,.TRUE.,.FALSE.,.TRUE.,.FALSE.,.FALSE.,.FALSE.,.TRUE.,.FALSE.
     $ ,.TRUE.,.FALSE.,.TRUE.,.TRUE./
      DATA(ICOLAMP(I,2,1),I=1,24)/.FALSE.,.TRUE.,.FALSE.,.FALSE.
     $ ,.FALSE.,.FALSE.,.FALSE.,.TRUE.,.FALSE.,.FALSE.,.FALSE.,.FALSE.
     $ ,.TRUE.,.FALSE.,.TRUE.,.FALSE.,.TRUE.,.TRUE.,.TRUE.,.FALSE.
     $ ,.TRUE.,.FALSE.,.FALSE.,.FALSE./

In particular:

@oliviermattelaer any comment?

In the meantime I will check better the code (and recover old issues, as we had pretty much similar problems in the past)

valassi commented 3 months ago

Note, this is all very much related to

THE QUESTION IS, SHOULD THE TWO ARRAYS BE EXACTLY THE SAME IN GENERATED FORTRAN AND GENERATED C? (I actually very much hope they do, we should most definitely not start having different arrays IMO...)

If the answer is that they should, then definitely this is a bug. And then somehow my sed script to produce coloramps.h from the .inc was at least producing the right order.

In retrospective, I guess this is what happened?

valassi commented 3 months ago

Ouf.

Why is mapconfigs a different python type of object, and contains different values, in cpp and f77?... The coloramps.h algorithm then is almost identical...

This is for gg_ttgg

DEBUG: 'ICOLAMP CPP',mapconfigs = ICOLAMP CPP {1: [1], 2: [2], 3: [3], 4: [4], 5: [5], 6: [6], 7: [7], 8: [8], 9: [9], 10: [10], 11: [11], 12: [12], 13: [13], 14: [14], 15: [15], 16: [16], 17: [17], 18: [18], 19: [19], 20: [20], 21: [21], 22: [22], 23: [23], 24: [24], 25: [25], 26: [26], 27: [27], 28: [28], 29: [29], 30: [30], 31: [32], 32: [33], 33: [34], 34: [35], 35: [36], 36: [37], 37: [38], 38: [39], 39: [40], 40: [41], 41: [42], 42: [43], 43: [44], 44: [45], 45: [46], 46: [48], 47: [49], 48: [50], 49: [51], 50: [52], 51: [53], 52: [54], 53: [55], 54: [56], 55: [58], 56: [59], 57: [60], 58: [61], 59: [62], 60: [63], 61: [64], 62: [65], 63: [66], 64: [67], 65: [68], 66: [69], 67: [70], 68: [71], 69: [72], 70: [74], 71: [75], 72: [76], 73: [77], 74: [78], 75: [79], 76: [80], 77: [81], 78: [82], 79: [83], 80: [84], 81: [85], 82: [86], 83: [87], 84: [88], 85: [89], 86: [90], 87: [91], 88: [93], 89: [94], 90: [95], 91: [96], 92: [97], 93: [98], 94: [100], 95: [101], 96: [102], 97: [103], 98: [104], 99: [105], 100: [107], 101: [108], 102: [109], 103: [110], 104: [111], 105: [112]} [export_cpp.py at line 1767] 300a302

DEBUG: 'ICOLAMP F77',mapconfigs = ICOLAMP F77 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 54, 55, 56, 57, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 94, 95, 96, 97, 98, 99, 101, 102, 103, 104, 105, 106, 108, 109, 110, 111, 112, 113] [export_v4.py at line 1184]

valassi commented 3 months ago

With a very wild leap of speculative imagination, I guess that the problem is... Fortran vs C indexing?

The mapconfigs in cpp is a dictionary with 105 values. The first value is 1, the last value is 112.

The mapconfigs in f77 is an array with 105 values (I guess). The first value is 2, the last value is 113.

I will wildly guess that I just ned to add +1 to the values in the cpp mapconfigs.

valassi commented 3 months ago

I will wildly guess that I just ned to add +1 to the values in the cpp mapconfigs.

That, and not only. There is ALSO the problem that the get_icolamp_lines version in cpp uses the dictionary key instead of the dictionary value... In practice, it is using a dummy mapping 1->1 etc... (except for a +1 offset at most).

I will prepare a patch...

valassi commented 3 months ago

I will prepare a patch...

Ok the MG5AMC patch is submitted to https://github.com/mg5amcnlo/mg5amcnlo/pull/115

I am now going to integrate here in madgraph4gpu too

valassi commented 3 months ago

Why is mapconfigs a different python type of object, and contains different values, in cpp and f77?... The coloramps.h algorithm then is almost identical...

The icolamp arrays are now identical

CPP https://github.com/valassi/madgraph4gpu/blob/8d5338286910e37950e470840fd698fd3ce48acd/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/coloramps.h#L145

device constexpr bool icolamp[105][24] = { // note: a trailing comma in the initializer list is allowed { true, false, false, false, false, false, true, false, false, false, false, false, true, false, true, false, false, false, true, false, true, false, true, true }, // ICONFIG=1 <-- CHANNEL_ID=2 { false, true, false, false, false, false, false, true, false, false, false, false, true, false, true, false, true, true, true, false, true, false, false, false }, // ICONFIG=2 <-- CHANNEL_ID=3 { true, true, false, false, false, false, true, true, false, false, false, false, false, false, false, false, true, true, false, false, false, false, true, true }, // ICONFIG=3 <-- CHANNEL_ID=4

F77 https://github.com/valassi/madgraph4gpu/blob/8d5338286910e37950e470840fd698fd3ce48acd/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/coloramps.inc#L1

 LOGICAL ICOLAMP(24,105,1)
DATA(ICOLAMP(I,1,1),I=1,24)/.TRUE.,.FALSE.,.FALSE.,.FALSE.
$ ,.FALSE.,.FALSE.,.TRUE.,.FALSE.,.FALSE.,.FALSE.,.FALSE.,.FALSE.
$ ,.TRUE.,.FALSE.,.TRUE.,.FALSE.,.FALSE.,.FALSE.,.TRUE.,.FALSE.
$ ,.TRUE.,.FALSE.,.TRUE.,.TRUE./
 DATA(ICOLAMP(I,2,1),I=1,24)/.FALSE.,.TRUE.,.FALSE.,.FALSE.
$ ,.FALSE.,.FALSE.,.FALSE.,.TRUE.,.FALSE.,.FALSE.,.FALSE.,.FALSE.
$ ,.TRUE.,.FALSE.,.TRUE.,.FALSE.,.TRUE.,.TRUE.,.TRUE.,.FALSE.
$ ,.TRUE.,.FALSE.,.FALSE.,.FALSE./
 DATA(ICOLAMP(I,3,1),I=1,24)/.TRUE.,.TRUE.,.FALSE.,.FALSE.
$ ,.FALSE.,.FALSE.,.TRUE.,.TRUE.,.FALSE.,.FALSE.,.FALSE.,.FALSE.
$ ,.FALSE.,.FALSE.,.FALSE.,.FALSE.,.TRUE.,.TRUE.,.FALSE.,.FALSE.
$ ,.FALSE.,.FALSE.,.TRUE.,.TRUE./
valassi commented 3 months ago

There are 23 failing checks, including many build errors eg for fptype=m. But issue #856 seems gone.

valassi commented 3 months ago

I have fixed the issue for FPTYE=m and relaunched the CI

valassi commented 3 months ago

I confirm that issue #856 has disappeared from the CI. There are only six errors remaining, 3 fptypes for each of

valassi commented 3 months ago

Unfortunately, I instead confirm that the intermittent crash #845 is NOT fixed by this PR. I am investigating, but this will maybe be a different PR. For the moment I will rerun large scale tests.

valassi commented 3 months ago

Hi @oliviermattelaer I think that this is ready for review. Can you please have a look? Thanks!

(I still need to run some manual tests before merging, but the PR is ready for review as nothing will really change functionally).

roiser commented 3 months ago
* no cross section in susy_gg_t1t1 (the infamous [No cross section in SUSY gg_t1t1 log file #826](https://github.com/madgraph5/madgraph4gpu/issues/826), stilll there)

pls note I'm looking into this and I think its understood, will reply later in #826

valassi commented 3 months ago

pls note I'm looking into this and I think its understood, will reply later in #826

Thanks @roiser very good. If the issue (no cross section in susy_gg_t1t1) is actually caused by couplings, maybe discuss the technicalities in #862 (or as you want, I will add a note there if this is the case).

valassi commented 3 months ago

I have completed also all manual tests - this is ready to be merged

oliviermattelaer commented 3 months ago

Hi Andrea,

Changing the base since I need to see what you did change compare to my fix_826 branch. Otherwise this is impossible to review since it includes too many things

valassi commented 3 months ago

Changing the base since I need to see what you did change compare to my fix_826 branch.

Hi Olivier, this is causing conflicts. And fix_826 itself has conflicts to the latest master now I guess.

I spent quite some effort fixing conflicts to make sure this can be merged into master. I will not do this again, sorry.

But if it helps you to review, ok. However when/if we decide to merge, this must merge into master, not into fix_826

Thanks Andrea

valassi commented 3 months ago

PS Please FIRST review https://github.com/mg5amcnlo/mg5amcnlo/pull/115. This #873 depends on that.

valassi commented 3 months ago

Anyway, apart from logs, test scripts etc, the only relevant changes are the MG5AMC upgrade and this

git diff --no-ext-diff origin/color upstream/fix_826 CODEGEN/ > diff
...
diff --git a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
index b90ef84b4..fc56de664 100644
--- a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
+++ b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
@@ -157,7 +157,7 @@ index 4fbb8e6ba..f9e2335de 100644
        END

 diff --git b/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f a/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f
-index 1124a9164..27a6e4674 100644
+index 71fbf2b25..0f1d199fc 100644
 --- b/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f
 +++ a/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f
 @@ -74,13 +74,77 @@ c      common/to_colstats/ncols,ncolflow,ncolalt,ic
@@ -239,7 +239,7 @@ index 1124a9164..27a6e4674 100644
  c
  c     Read process number
  c
-@@ -208,8 +272,33 @@ c      call sample_result(xsec,xerr)
+@@ -207,8 +271,33 @@ c      call sample_result(xsec,xerr)
  c      write(*,*) 'Final xsec: ',xsec

        rewind(lun)
@@ -274,7 +274,7 @@ index 1124a9164..27a6e4674 100644
        end

  c     $B$ get_user_params $B$ ! tag for MadWeight
-@@ -387,7 +476,7 @@ c
+@@ -386,7 +475,7 @@ c
        fopened=.false.
        tempname=filename    
        fine=index(tempname,' ')     
diff --git a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/coloramps.h b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/coloramps.h
index c8657394c..3d03a6063 100644
--- a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/coloramps.h
+++ b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/coloramps.h
@@ -1,38 +1,41 @@
 // Copyright (C) 2020-2024 CERN and UCLouvain.
 // Licensed under the GNU Lesser General Public License (version 3 or later).
 // Created by: A. Valassi (Dec 2022) for the MG5aMC CUDACPP plugin.
-// Further modified by: O. Mattelaer, A. Valassi (2022-2024) for the MG5aMC CUDACPP plugin.
+// Further modified by: A. Valassi (2022-2024) for the MG5aMC CUDACPP plugin.

 #ifndef COLORAMPS_H
 #define COLORAMPS_H 1

-namespace mgOnGpu /* clang-format off */
+#include <map>
+
+namespace mgOnGpu
 {
   // Summary of numbering and indexing conventions for the relevant concepts (see issue #826 and PR #852)
   // - Diagram number (no variable) in [1, N_diagrams]: all values are allowed (N_diagrams distinct values)
   //   => this number is displayed for information before each block of code in CPPProcess.cc
-  // - Channel number ("channelId" in C, CHANNEL_ID in F) in [1, N_diagrams]: not all values are allowed (N_config <= N_diagrams distinct values)
-  //   => this number (with F indexing as in ps/pdf output) is passed around as an API argument between cudacpp functions
-  //   Note: the old API passes around a single CHANNEL_ID (and uses CHANNEL_ID=0 to indicate no-multichannel mode, but this is not used in coloramps.h),
-  //   while the new API passes around an array of CHANNEL_ID's (and uses a NULL array pointer to indicate no-multichannel mode)
-  // - Channel number in C indexing: "channelIdC" = channelID - 1
-  //   => this number (with C indexing) is used as the index of the channelIdC_to_iconfig array below
-  // - Config number ("iconfig" in C, ICONFIG in F) in [1, N_config]: all values are allowed (N_config <= N_diagrams distinct values)
-  // - Config number in C indexing: "iconfigC" = iconfig - 1
-  //   => this number (with C indexing) is used as the index of the icolamp array below
-
-  // Map channelIdC (in C indexing, i.e. channelId-1) to iconfig (in F indexing)
-  // Note: iconfig=0 indicates invalid values, i.e. channels/diagrams with no single-diagram enhancement in the MadEvent sampling algorithm (presence of 4-point interaction?)
-  // This array has N_diagrams elements, but only N_config <= N_diagrams valid (non-zero) values
-  __device__ constexpr int channelIdC_to_iconfig[%(nb_diag)i] = { // note: a trailing comma in the initializer list is allowed
-%(channelc2iconfig_lines)s
+  // - Channel number (CHANNEL_ID) in [0, N_diagrams]: not all values are allowed (N_config <= N_diagrams distinct values)
+  //   => this number (with indexing like ps/pdf output) is passed around as an API argument between cudacpp functions
+  //   0 is allowed to fallback to no multi-channel mode.
+  // - Channel number in C indexing: "IconfiC", this is the equivalent of the Fortran iconfig
+  //   iconfigC = iconfig -1
+  //   provides a continuous index [0, N_config-1] for array
+  //  iconfigC = ChannelId_to_iconfigC[channelId]
+  //NOTE: All those ordering are event by event specific (with the intent to have those fix within a vector size/wrap   
+  
+  // Map channelId to iconfigC
+  // This array has N_diagrams+1 elements, but only N_config <= N_diagrams valid values
+  // unvalid values are set to -1
+  // The 0 entry is a fall back to still write events even if no multi-channel is setup (wrong color selected in that mode) 
+    __device__ constexpr int channelId_to_iconfigC[%(nb_diag_plus_one)i] = {
+     0, // channelId=0: This value means not multi-channel, color will be wrong anyway -> pick the first
+%(diag_to_channel)s
   };

   // Map iconfigC (in C indexing, i.e. iconfig-1) to the set of allowed colors
-  // This array has N_config <= N_diagrams elements
-  __device__ constexpr bool icolamp[%(nb_channel)s][%(nb_color)s] = { // note: a trailing comma in the initializer list is allowed
-%(icolamp_lines)s
-  }; /* clang-format on */
+  // This array has N_config <= N_diagrams elements    
+    __device__ constexpr bool icolamp[%(nb_channel)s][%(nb_color)s] = {
+%(is_LC)s
+  };

 }
 #endif // COLORAMPS_H
diff --git a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/process_sigmaKin_function.inc b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/process_sigmaKin_function.inc
index c3c7fd466..2ec052e3a 100644
--- a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/process_sigmaKin_function.inc
+++ b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/process_sigmaKin_function.inc
@@ -4,7 +4,7 @@
 ! Copyright (C) 2020-2024 CERN and UCLouvain.
 ! Licensed under the GNU Lesser General Public License (version 3 or later).
 ! Modified by: A. Valassi (Sep 2021) for the MG5aMC CUDACPP plugin.
-! Further modified by: O. Mattelaer, J. Teig, A. Valassi (2021-2024) for the MG5aMC CUDACPP plugin.
+! Further modified by: J. Teig, A. Valassi (2021-2024) for the MG5aMC CUDACPP plugin.
 !==========================================================================

 #include "GpuAbstraction.h"
@@ -72,9 +72,7 @@
     // Event-by-event random choice of color #402
     if( channelId != 0 ) // no event-by-event choice of color if channelId == 0 (fix FPE #783)
     {
-      const unsigned int channelIdC = channelId - 1;                           // channelIdC_to_iconfig in coloramps.h uses the C array indexing starting at 0
-      const unsigned int iconfig = mgOnGpu::channelIdC_to_iconfig[channelIdC]; // map N_diagrams to N_config <= N_diagrams configs (fix LHE color mismatch #856: see also #826, #852, #853)
-      const unsigned int iconfigC = iconfig - 1;                               // icolamp in coloramps.h uses the C array indexing starting at 0
+      const unsigned int iconfigC = mgOnGpu::channelId_to_iconfigC[channelId]; // coloramps.h uses a channel ordering not the diagram id
       fptype targetamp[ncolor] = { 0 };
       for( int icolC = 0; icolC < ncolor; icolC++ )
       {
@@ -117,7 +115,7 @@
     // - firstprivate: give each thread its own copy, and initialise with value from outside
 #define _OMPLIST0 allcouplings, allMEs, allmomenta, allrndcol, allrndhel, allselcol, allselhel, cGoodHel, cNGoodHel, npagV2
 #ifdef MGONGPU_SUPPORTS_MULTICHANNEL
-#define _OMPLIST1 , allDenominators, allNumerators, channelId, mgOnGpu::icolamp, mgOnGpu::channelIdC_to_iconfig
+#define _OMPLIST1 , allDenominators, allNumerators, channelId, mgOnGpu::icolamp, mgOnGpu::channelId_to_iconfigC
 #else
 #define _OMPLIST1
 #endif
@@ -189,9 +187,7 @@
       // Event-by-event random choice of color #402
       if( channelId != 0 ) // no event-by-event choice of color if channelId == 0 (fix FPE #783)
       {
-        const unsigned int channelIdC = channelId - 1;                           // channelIdC_to_iconfig in coloramps.h uses the C array indexing starting at 0
-        const unsigned int iconfig = mgOnGpu::channelIdC_to_iconfig[channelIdC]; // map N_diagrams to N_config <= N_diagrams configs (fix LHE color mismatch #856: see also #826, #852, #853)
-        const unsigned int iconfigC = iconfig - 1;                               // icolamp in coloramps.h uses the C array indexing starting at 0
+        const unsigned int iconfigC = mgOnGpu::channelId_to_iconfigC[channelId]; // coloramps.h uses a channel ordering not the diagram id
         fptype_sv targetamp[ncolor] = { 0 };
         for( int icolC = 0; icolC < ncolor; icolC++ )
         {
@@ -202,6 +198,7 @@
           if( mgOnGpu::icolamp[iconfigC][icolC] ) targetamp[icolC] += jamp2_sv[icolC];
         }
 #if defined MGONGPU_CPPSIMD and defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT
+        const unsigned int iconfigC = mgOnGpu::channelId_to_iconfigC[channelId]; // coloramps.h uses a channel ordering not the diagram id
         fptype_sv targetamp2[ncolor] = { 0 };
         for( int icolC = 0; icolC < ncolor; icolC++ )
         {
diff --git a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/model_handling.py b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/model_handling.py
index 5a275b566..a735a80dc 100644
--- a/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/model_handling.py
+++ b/epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/model_handling.py
@@ -1548,28 +1548,27 @@ class PLUGIN_OneProcessExporter(PLUGIN_export_cpp.OneProcessExporterGPU):
         # will be smaller than the true number of diagram. This is fine for color
         # but maybe not for something else.
         nb_diag = max(config[0] for config in config_subproc_map)
-        import math
-        ndigits = str(int(math.log10(nb_diag))+1)
         # Output which diagrams correspond ot a channel to get information for valid color
         lines = []
+        # Note: line for index 0 (no multi-channel is hardcoded in the template, so here we fill the array from index 1)
         for diag in range(1, nb_diag+1):
-            channelidf = diag
-            channelidc = channelidf - 1 # C convention 
             if diag in diag_to_iconfig:
                 iconfigf = diag_to_iconfig[diag]
-                iconfigftxt = '%i'%iconfigf
+                iconfigc = iconfigf - 1 # C convention 
+                text = "     %(iconfigc)i, // channelId=%(diag)i (diagram=%(diag)i) --> iconfig=%(iconfigf)i (f77 conv) and iconfigC=%(iconfigc)i (c conv)" 
             else:
-                iconfigf = 0
-                iconfigftxt = 'None'
-            text = '    %(iconfigf)NNNNi, // CHANNEL_ID=%(channelidf)-NNNNi i.e. DIAGRAM=%(diag)-NNNNi --> ICONFIG=%(iconfigftxt)s'.replace('NNNN',ndigits)
-            lines.append(text % {'diag':diag, 'channelidf':channelidf, 'iconfigf':iconfigf, 'iconfigftxt':iconfigftxt})
-        replace_dict['channelc2iconfig_lines'] = '\n'.join(lines)
+                iconfigc = -1
+                iconfigf = -1
+                text = "     -1, // channelId=%(diag)i (diagram=%(diag)i): Not consider as a channel of integration (presence of 4 point interaction?)" 
+            lines.append(text % {'diag': diag, 'iconfigc':  iconfigc, 'iconfigf':iconfigf})  
+
+        replace_dict['diag_to_channel'] = '\n'.join(lines)

         if self.include_multi_channel: # NB unnecessary as edit_coloramps is not called otherwise...
             multi_channel = self.get_multi_channel_dictionary(self.matrix_elements[0].get('diagrams'), self.include_multi_channel)
-            replace_dict['icolamp_lines'] = self.get_icolamp_lines(multi_channel, self.matrix_elements[0], 1)
+            replace_dict['is_LC'] = self.get_icolamp_lines(multi_channel, self.matrix_elements[0], 1)
             replace_dict['nb_channel'] = len(multi_channel)
-            replace_dict['nb_diag'] = max(config[0] for config in config_subproc_map)
+            replace_dict['nb_diag_plus_one'] = max(config[0] for config in config_subproc_map)+1
             replace_dict['nb_color'] = max(1,len(self.matrix_elements[0].get('color_basis')))

             misc.sprint(multi_channel)
@@ -1577,14 +1576,18 @@ class PLUGIN_OneProcessExporter(PLUGIN_export_cpp.OneProcessExporterGPU):
             #raise Exception

             # AV extra formatting (e.g. gg_tt was "{{true,true};,{true,false};,{false,true};};")
-            ###misc.sprint(replace_dict['icolamp_lines'])
-            split = replace_dict['icolamp_lines'].replace('{{','{').replace('};};','}').split(';,')
-            text=', // ICONFIG=%-NNNNi <-- CHANNEL_ID=%i'.replace('NNNN',ndigits)
-            for iconfigc in range(len(split)): 
-                ###misc.sprint(split[iconfigc])
-                split[iconfigc] = '    ' + split[iconfigc].replace(',',', ').replace('true',' true').replace('{','{ ').replace('}',' }')
-                split[iconfigc] += text % (iconfigc+1, iconfig_to_diag[iconfigc+1])
-            replace_dict['icolamp_lines'] = '\n'.join(split)
+            split = replace_dict['is_LC'].split(';,') 
+            misc.sprint(replace_dict['is_LC'])
+            for i in range(len(split)): 
+                misc.sprint(split[i])
+                split[i] = '    ' + split[i].replace(',',', ').replace('{{', '{')
+                misc.sprint(split[i])
+                if '};};' in split[i]:
+                   split[i] = split[i][:-4] + '}, // iconfigC=%i, diag=%i' % (i, iconfig_to_diag[i+1]) 
+                elif 'false' in split[i] or 'true' in split[i]:
+                    split[i] += ', // iconfigC=%i, diag=%i' % (i, iconfig_to_diag[i+1])
+                misc.sprint(split[i])
+            replace_dict['is_LC'] = '\n'.join(split)
             ff.write(template % replace_dict)
         ff.close()

diff --git a/epochX/cudacpp/CODEGEN/checkFormatting.sh b/epochX/cudacpp/CODEGEN/checkFormatting.sh
index a6db39255..a016a7958 100755
--- a/epochX/cudacpp/CODEGEN/checkFormatting.sh
+++ b/epochX/cudacpp/CODEGEN/checkFormatting.sh
@@ -37,7 +37,7 @@ function checkProcdir()
   cd $TOPDIR
   if [ ! -d $procdir ]; then echo "ERROR! Directory not found $TOPDIR/$procdir"; exit 1; fi
   # Define the list of files to be checked
-  files=$(\ls $procdir/src/*.cc $procdir/src/*.h $procdir/SubProcesses/*.cc $procdir/SubProcesses/*.h $procdir/SubProcesses/P*/check_sa.cc $procdir/SubProcesses/P*/CPPProcess.cc $procdir/SubProcesses/P*/CPPProcess.h $procdir/SubProcesses/P*/epoch_process_id.h $procdir/SubProcesses/P*/coloramps.h)
+  files=$(\ls $procdir/src/*.cc $procdir/src/*.h $procdir/SubProcesses/*.cc $procdir/SubProcesses/*.h $procdir/SubProcesses/P*/check_sa.cc $procdir/SubProcesses/P*/CPPProcess.cc $procdir/SubProcesses/P*/CPPProcess.h $procdir/SubProcesses/P*/epoch_process_id.h)
   if [ "$files" == "" ]; then echo "ERROR! No files to check found in directory $TOPDIR/$procdir"; exit 1; fi
   # Check each file
   status=0
valassi commented 3 months ago

Essentially I went back to my three step mapping instead of your one step mapping. Given how complex this is (see the fact that even export_cpp and export_v4 disagree due to fortran/cpp indexing in https://github.com/mg5amcnlo/mg5amcnlo/pull/115), I really think this makes it clearer.

And the generated code diffs are these

git diff --no-ext-diff 22328317716dee0b12e99547d4405d7d55a33e21..fc1b5cd75824f44e99253bc82a0dfecf14ab0f81 susy_gg_t1t1.mad/SubProcesses/ > diff
...
diff --git a/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/CPPProcess.cc b/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/CPPProcess.cc
index 4f46a8c4f..e5f1721f3 100644
--- a/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/CPPProcess.cc
+++ b/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/CPPProcess.cc
@@ -1009,7 +1009,9 @@ namespace mg5amcCpu
     // Event-by-event random choice of color #402
     if( channelId != 0 ) // no event-by-event choice of color if channelId == 0 (fix FPE #783)
     {
-      const unsigned int iconfigC = mgOnGpu::channelId_to_iconfigC[channelId]; // coloramps.h uses a channel ordering not the diagram id
+      const unsigned int channelIdC = channelId - 1;                           // channelIdC_to_iconfig in coloramps.h uses the C array indexing starting at 0
+      const unsigned int iconfig = mgOnGpu::channelIdC_to_iconfig[channelIdC]; // map N_diagrams to N_config <= N_diagrams configs (fix LHE color mismatch #856: see also #826, #852, #853)
+      const unsigned int iconfigC = iconfig - 1;                               // icolamp in coloramps.h uses the C array indexing starting at 0
       fptype targetamp[ncolor] = { 0 };
       for( int icolC = 0; icolC < ncolor; icolC++ )
       {
@@ -1052,7 +1054,7 @@ namespace mg5amcCpu
     // - firstprivate: give each thread its own copy, and initialise with value from outside
 #define _OMPLIST0 allcouplings, allMEs, allmomenta, allrndcol, allrndhel, allselcol, allselhel, cGoodHel, cNGoodHel, npagV2
 #ifdef MGONGPU_SUPPORTS_MULTICHANNEL
-#define _OMPLIST1 , allDenominators, allNumerators, channelId, mgOnGpu::icolamp, mgOnGpu::channelId_to_iconfigC
+#define _OMPLIST1 , allDenominators, allNumerators, channelId, mgOnGpu::icolamp, mgOnGpu::channelIdC_to_iconfig
 #else
 #define _OMPLIST1
 #endif
@@ -1124,7 +1126,9 @@ namespace mg5amcCpu
       // Event-by-event random choice of color #402
       if( channelId != 0 ) // no event-by-event choice of color if channelId == 0 (fix FPE #783)
       {
-        const unsigned int iconfigC = mgOnGpu::channelId_to_iconfigC[channelId]; // coloramps.h uses a channel ordering not the diagram id
+        const unsigned int channelIdC = channelId - 1;                           // channelIdC_to_iconfig in coloramps.h uses the C array indexing starting at 0
+        const unsigned int iconfig = mgOnGpu::channelIdC_to_iconfig[channelIdC]; // map N_diagrams to N_config <= N_diagrams configs (fix LHE color mismatch #856: see also #826, #852, #853)
+        const unsigned int iconfigC = iconfig - 1;                               // icolamp in coloramps.h uses the C array indexing starting at 0
         fptype_sv targetamp[ncolor] = { 0 };
         for( int icolC = 0; icolC < ncolor; icolC++ )
         {
@@ -1135,7 +1139,6 @@ namespace mg5amcCpu
           if( mgOnGpu::icolamp[iconfigC][icolC] ) targetamp[icolC] += jamp2_sv[icolC];
         }
 #if defined MGONGPU_CPPSIMD and defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT
-        const unsigned int iconfigC = mgOnGpu::channelId_to_iconfigC[channelId]; // coloramps.h uses a channel ordering not the diagram id
         fptype_sv targetamp2[ncolor] = { 0 };
         for( int icolC = 0; icolC < ncolor; icolC++ )
         {
diff --git a/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/coloramps.h b/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/coloramps.h
index 1a931324b..28a4d6d32 100644
--- a/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/coloramps.h
+++ b/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/coloramps.h
@@ -1,50 +1,47 @@
 // Copyright (C) 2020-2024 CERN and UCLouvain.
 // Licensed under the GNU Lesser General Public License (version 3 or later).
 // Created by: A. Valassi (Dec 2022) for the MG5aMC CUDACPP plugin.
-// Further modified by: A. Valassi (2022-2024) for the MG5aMC CUDACPP plugin.
+// Further modified by: O. Mattelaer, A. Valassi (2022-2024) for the MG5aMC CUDACPP plugin.

 #ifndef COLORAMPS_H
 #define COLORAMPS_H 1

-#include <map>
-
-namespace mgOnGpu
+namespace mgOnGpu /* clang-format off */
 {
   // Summary of numbering and indexing conventions for the relevant concepts (see issue #826 and PR #852)
   // - Diagram number (no variable) in [1, N_diagrams]: all values are allowed (N_diagrams distinct values)
   //   => this number is displayed for information before each block of code in CPPProcess.cc
-  // - Channel number (CHANNEL_ID) in [0, N_diagrams]: not all values are allowed (N_config <= N_diagrams distinct values)
-  //   => this number (with indexing like ps/pdf output) is passed around as an API argument between cudacpp functions
-  //   0 is allowed to fallback to no multi-channel mode.
-  // - Channel number in C indexing: "IconfiC", this is the equivalent of the Fortran iconfig
-  //   iconfigC = iconfig -1
-  //   provides a continuous index [0, N_config-1] for array
-  //  iconfigC = ChannelId_to_iconfigC[channelId]
-  //NOTE: All those ordering are event by event specific (with the intent to have those fix within a vector size/wrap   
-  
-  // Map channelId to iconfigC
-  // This array has N_diagrams+1 elements, but only N_config <= N_diagrams valid values
-  // unvalid values are set to -1
-  // The 0 entry is a fall back to still write events even if no multi-channel is setup (wrong color selected in that mode) 
-    __device__ constexpr int channelId_to_iconfigC[7] = {
-     0, // channelId=0: This value means not multi-channel, color will be wrong anyway -> pick the first
-     -1, // channelId=1 (diagram=1): Not consider as a channel of integration (presence of 4 point interaction?)
-     0, // channelId=2 (diagram=2) --> iconfig=1 (f77 conv) and iconfigC=0 (c conv)
-     1, // channelId=3 (diagram=3) --> iconfig=2 (f77 conv) and iconfigC=1 (c conv)
-     2, // channelId=4 (diagram=4) --> iconfig=3 (f77 conv) and iconfigC=2 (c conv)
-     3, // channelId=5 (diagram=5) --> iconfig=4 (f77 conv) and iconfigC=3 (c conv)
-     4, // channelId=6 (diagram=6) --> iconfig=5 (f77 conv) and iconfigC=4 (c conv)
+  // - Channel number ("channelId" in C, CHANNEL_ID in F) in [1, N_diagrams]: not all values are allowed (N_config <= N_diagrams distinct values)
+  //   => this number (with F indexing as in ps/pdf output) is passed around as an API argument between cudacpp functions
+  //   Note: the old API passes around a single CHANNEL_ID (and uses CHANNEL_ID=0 to indicate no-multichannel mode, but this is not used in coloramps.h),
+  //   while the new API passes around an array of CHANNEL_ID's (and uses a NULL array pointer to indicate no-multichannel mode)
+  // - Channel number in C indexing: "channelIdC" = channelID - 1
+  //   => this number (with C indexing) is used as the index of the channelIdC_to_iconfig array below
+  // - Config number ("iconfig" in C, ICONFIG in F) in [1, N_config]: all values are allowed (N_config <= N_diagrams distinct values)
+  // - Config number in C indexing: "iconfigC" = iconfig - 1
+  //   => this number (with C indexing) is used as the index of the icolamp array below
+
+  // Map channelIdC (in C indexing, i.e. channelId-1) to iconfig (in F indexing)
+  // Note: iconfig=0 indicates invalid values, i.e. channels/diagrams with no single-diagram enhancement in the MadEvent sampling algorithm (presence of 4-point interaction?)
+  // This array has N_diagrams elements, but only N_config <= N_diagrams valid (non-zero) values
+  __device__ constexpr int channelIdC_to_iconfig[6] = { // note: a trailing comma in the initializer list is allowed
+    0, // CHANNEL_ID=1 i.e. DIAGRAM=1 --> ICONFIG=None
+    1, // CHANNEL_ID=2 i.e. DIAGRAM=2 --> ICONFIG=1
+    2, // CHANNEL_ID=3 i.e. DIAGRAM=3 --> ICONFIG=2
+    3, // CHANNEL_ID=4 i.e. DIAGRAM=4 --> ICONFIG=3
+    4, // CHANNEL_ID=5 i.e. DIAGRAM=5 --> ICONFIG=4
+    5, // CHANNEL_ID=6 i.e. DIAGRAM=6 --> ICONFIG=5
   };

   // Map iconfigC (in C indexing, i.e. iconfig-1) to the set of allowed colors
-  // This array has N_config <= N_diagrams elements    
-    __device__ constexpr bool icolamp[5][2] = {
-    {true, true}, // iconfigC=0, diag=2
-    {true, true}, // iconfigC=1, diag=3
-    {true, false}, // iconfigC=2, diag=4
-    {true, false}, // iconfigC=3, diag=5
-    {false, true}, // iconfigC=4, diag=6
-  };
+  // This array has N_config <= N_diagrams elements
+  __device__ constexpr bool icolamp[5][2] = { // note: a trailing comma in the initializer list is allowed
+    {  true,  true }, // ICONFIG=1 <-- CHANNEL_ID=2
+    {  true, false }, // ICONFIG=2 <-- CHANNEL_ID=3
+    {  true, false }, // ICONFIG=3 <-- CHANNEL_ID=4
+    { false,  true }, // ICONFIG=4 <-- CHANNEL_ID=5
+    { false,  true }, // ICONFIG=5 <-- CHANNEL_ID=6
+  }; /* clang-format on */

 }
 #endif // COLORAMPS_H
valassi commented 3 months ago

Essentially I went back to my three step mapping instead of your one step mapping

And this means I remove the first element of the channel-to-config array, to just keep the fortran mappings: essentially this array does the fortran-to-fortran mapping, then some variables handle the -1 translation to C indexing. There are Ndiag+1 (7 here) entries in Olivier's code and Ndiag in mine (6 here), note that only Ndiag mappings are needed, not Ndiag+1.

-  // Map channelId to iconfigC
-  // This array has N_diagrams+1 elements, but only N_config <= N_diagrams valid values
-  // unvalid values are set to -1
-  // The 0 entry is a fall back to still write events even if no multi-channel is setup (wrong color selected in that mode) 
-    __device__ constexpr int channelId_to_iconfigC[7] = {
-     0, // channelId=0: This value means not multi-channel, color will be wrong anyway -> pick the first
-     -1, // channelId=1 (diagram=1): Not consider as a channel of integration (presence of 4 point interaction?)
-     0, // channelId=2 (diagram=2) --> iconfig=1 (f77 conv) and iconfigC=0 (c conv)
-     1, // channelId=3 (diagram=3) --> iconfig=2 (f77 conv) and iconfigC=1 (c conv)
-     2, // channelId=4 (diagram=4) --> iconfig=3 (f77 conv) and iconfigC=2 (c conv)
-     3, // channelId=5 (diagram=5) --> iconfig=4 (f77 conv) and iconfigC=3 (c conv)
-     4, // channelId=6 (diagram=6) --> iconfig=5 (f77 conv) and iconfigC=4 (c conv)
...
+  // Map channelIdC (in C indexing, i.e. channelId-1) to iconfig (in F indexing)
+  // Note: iconfig=0 indicates invalid values, i.e. channels/diagrams with no single-diagram enhancement in the MadEvent sampling algorithm (presence of 4-point interaction?)
+  // This array has N_diagrams elements, but only N_config <= N_diagrams valid (non-zero) values
+  __device__ constexpr int channelIdC_to_iconfig[6] = { // note: a trailing comma in the initializer list is allowed
+    0, // CHANNEL_ID=1 i.e. DIAGRAM=1 --> ICONFIG=None
+    1, // CHANNEL_ID=2 i.e. DIAGRAM=2 --> ICONFIG=1
+    2, // CHANNEL_ID=3 i.e. DIAGRAM=3 --> ICONFIG=2
+    3, // CHANNEL_ID=4 i.e. DIAGRAM=4 --> ICONFIG=3
+    4, // CHANNEL_ID=5 i.e. DIAGRAM=5 --> ICONFIG=4
+    5, // CHANNEL_ID=6 i.e. DIAGRAM=6 --> ICONFIG=5
   };
valassi commented 3 months ago

This has conflicts in branch color (because it was modified to be agains fix_826). I opened a new PR #877 with no conflicts, based on branch color2 (which can be both against fix_826 or against master)

Eventually this will be closed and #877 merged, I guess

valassi commented 3 months ago

This is obsolete because it has merge conflicts. It has been replaced by #877 which is functionally equivalent (and includes improvements suggested by Olivier). Closing as unmerged, at commit https://github.com/madgraph5/madgraph4gpu/pull/873/commits/93a547f2a8f05449b4329430a68edfe15eb2159e - I will probably delete/change the color branch.

valassi commented 3 months ago

For the record I did delete the obsolete color branch