madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
29 stars 33 forks source link

Out-of-bounds array warning in SUSY gq_ttllq #823

Closed valassi closed 4 months ago

valassi commented 4 months ago

I have essentially fixed all of the SUSY issues in #820 and am now systematically testing a large spectrum of SUSY processes. Since this is one process that I had used long ago, I am testing gq_ttllq. The build succeeds and the tests seem to succeed. however in one of the four subprocesses I get this warning, which looks worrying

/data/avalassi/GPU2023/madgraph4gpuBis/epochX/cudacpp/susy_gq_ttllq.sa/SubProcesses/P1_Sigma_MSSM_SLHA2_gu_ttxemepu> make cleanall; make HRDCOD=0 -j
...
ccache g++  -O3  -std=c++17 -I. -I../../src -Wall -Wshadow -Wextra -ffast-math  -fopenmp  -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -fPIC -c CPPProcess.cc -o CPPProcess.o
In file included from CPPProcess.cc:22:
MemoryAccessCouplingsFixed.h: In function ‘void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, mg5amcCpu::fptype_sv*, int)’:
MemoryAccessCouplingsFixed.h:41:44: warning: array subscript 12 is outside array bounds of ‘mgOnGpu::fptype [10]’ {aka ‘double [10]’} [-Warray-bounds]
   41 |       return &( buffer[iicoup * nx2 + ix2] ); // STRUCT[idcoup][ix2]
      |                                            ^
CPPProcess.cc:91:17: note: while referencing ‘mg5amcCpu::cIPC’
   91 |   static fptype cIPC[10];
      |                 ^~~~
In file included from CPPProcess.cc:22:
MemoryAccessCouplingsFixed.h:41:44: warning: array subscript 12 is outside array bounds of ‘mgOnGpu::fptype [10]’ {aka ‘double [10]’} [-Warray-bounds]
   41 |       return &( buffer[iicoup * nx2 + ix2] ); // STRUCT[idcoup][ix2]
      |                                            ^
CPPProcess.cc:91:17: note: while referencing ‘mg5amcCpu::cIPC’
   91 |   static fptype cIPC[10];
      |                 ^~~~

This is using gcc11.3. Also gcc12.1 gives the same warning. Strangely, clang14 does not complain.

Looking at the code is complex because this is deep inside some templated methods.

However the following looks wrong, even if it is just a comment.

In Param.h:

  namespace Parameters_MSSM_SLHA2_dependentCouplings
  {
    constexpr size_t ndcoup = 2; // #couplings that vary event by event because they depend on the running alphas QCD
    constexpr size_t idcoup_GC_51 = 0;
    constexpr size_t idcoup_GC_6 = 1;
    ...
  }
...
  namespace Parameters_MSSM_SLHA2_independentCouplings
  {
    constexpr size_t nicoup = 7; // #couplings that are fixed for all events because they do not depend on the running alphas QCD
    //constexpr size_t ixcoup_GC_1 = 0 + Parameters_MSSM_SLHA2_dependentCouplings::ndcoup; // out of ndcoup+nicoup
    //constexpr size_t ixcoup_GC_2 = 1 + Parameters_MSSM_SLHA2_dependentCouplings::ndcoup; // out of ndcoup+nicoup
    //constexpr size_t ixcoup_GC_3 = 2 + Parameters_MSSM_SLHA2_dependentCouplings::ndcoup; // out of ndcoup+nicoup
    //constexpr size_t ixcoup_GC_246 = 3 + Parameters_MSSM_SLHA2_dependentCouplings::ndcoup; // out of ndcoup+nicoup
    //constexpr size_t ixcoup_GC_248 = 4 + Parameters_MSSM_SLHA2_dependentCouplings::ndcoup; // out of ndcoup+nicoup
    //constexpr size_t ixcoup_GC_249 = 5 + Parameters_MSSM_SLHA2_dependentCouplings::ndcoup; // out of ndcoup+nicoup
    //constexpr size_t ixcoup_GC_250 = 6 + Parameters_MSSM_SLHA2_dependentCouplings::ndcoup; // out of ndcoup+nicoup
  }

In Param.cc:

void
Parameters_MSSM_SLHA2::setIndependentCouplings()
{
  GC_1 = -( mdl_ee * mdl_complexi ) / 3.;
  GC_2 = ( 2. * mdl_ee * mdl_complexi ) / 3.;
  GC_3 = -( mdl_ee * mdl_complexi );
  GC_246 = -( mdl_cw * mdl_ee * mdl_complexi ) / ( 2. * ( -1. + mdl_sw ) * mdl_sw * ( 1. + mdl_sw ) );
  GC_248 = -( mdl_cw * mdl_ee * mdl_complexi * mdl_sw ) / ( 3. * ( -1. + mdl_sw ) * ( 1. + mdl_sw ) );
  GC_249 = ( 2. * mdl_cw * mdl_ee * mdl_complexi * mdl_sw ) / ( 3. * ( -1. + mdl_sw ) * ( 1. + mdl_sw ) );
  GC_250 = -( ( mdl_cw * mdl_ee * mdl_complexi * mdl_sw ) / ( ( -1. + mdl_sw ) * ( 1. + mdl_sw ) ) );
}

In CPPProcess.cc:

../../SubProcesses/P1_Sigma_MSSM_SLHA2_gux_ttxemepux/CPPProcess.cc:  __device__ const fptype cIPC[14] = { (fptype)Parameters_MSSM_SLHA2::GC_3.real(), (fptype)Parameters_MSSM_SLHA2::GC_3.imag(), (fptype)Parameters_MSSM_SLHA2::GC_2.real(), (fptype)Parameters_MSSM_SLHA2::GC_2.imag(), (fptype)Parameters_MSSM_SLHA2::GC_246.real(), (fptype)Parameters_MSSM_SLHA2::GC_246.imag(), (fptype)Parameters_MSSM_SLHA2::GC_250.real(), (fptype)Parameters_MSSM_SLHA2::GC_250.imag(), (fptype)Parameters_MSSM_SLHA2::GC_249.real(), (fptype)Parameters_MSSM_SLHA2::GC_249.imag(), (fptype)Parameters_MSSM_SLHA2::GC_1.real(), (fptype)Parameters_MSSM_SLHA2::GC_1.imag(), (fptype)Parameters_MSSM_SLHA2::GC_248.real(), (fptype)Parameters_MSSM_SLHA2::GC_248.imag() };
...
../../SubProcesses/P1_Sigma_MSSM_SLHA2_gu_ttxemepu/CPPProcess.cc:  __device__ const fptype cIPC[10] = { (fptype)Parameters_MSSM_SLHA2::GC_3.real(), (fptype)Parameters_MSSM_SLHA2::GC_3.imag(), (fptype)Parameters_MSSM_SLHA2::GC_2.real(), (fptype)Parameters_MSSM_SLHA2::GC_2.imag(), (fptype)Parameters_MSSM_SLHA2::GC_246.real(), (fptype)Parameters_MSSM_SLHA2::GC_246.imag(), (fptype)Parameters_MSSM_SLHA2::GC_250.real(), (fptype)Parameters_MSSM_SLHA2::GC_250.imag(), (fptype)Parameters_MSSM_SLHA2::GC_249.real(), (fptype)Parameters_MSSM_SLHA2::GC_249.imag() };

Rephrasing: why does cIPC have different dimensions in P1_Sigma_MSSM_SLHA2_gu_ttxemepu (10, gives a warning) and in P1_Sigma_MSSM_SLHA2_gux_ttxemepux (14, no warning)??

valassi commented 4 months ago

Rephrasing: why does cIPC have different dimensions in P1_Sigma_MSSM_SLHA2_gu_ttxemepu (10, gives a warning) and in P1_Sigma_MSSM_SLHA2_gux_ttxemepux (14, no warning)??

@oliviermattelaer does this look right to you? I guess they should be the same... the two CPPProcess.cc are extremely similar

Thnaks

valassi commented 4 months ago

I have generated separately the four subprocesses susy_gu_ttllu, susy_gux_ttllux, susy_gd_ttlld, susy_gdx_ttlldx

It is quite clear that there is a problem in the generation of multi-P directories:

I guess there are largely speaking two options:

valassi commented 4 months ago

Ok I guess I understand better how this works now.

PLUGIN_GPUFOHelasCallWriter.format_coupling is called many times. This class is effectively a singleton and maintains effectively a singleton for couplings2order

    def format_coupling(self, call):
        """Format the coupling so any minus signs are put in front"""
        import re
        ###print(call) # FOR DEBUGGING
        model = self.get('model')
        newcoup = False
        if not hasattr(self, 'couplings2order'):
            self.couplings2order = {}
            self.params2order = {}

Specifically, the same instance of the helas writer is called in sequence by the different P1 subdirectories

Now the problems here, for the warning, might well be here in CPPProcess.cc

      for( size_t iicoup = 0; iicoup < nicoup; iicoup++ )
        allCOUPs[ndcoup + iicoup] = CI_ACCESS::iicoupAccessBufferConst( cIPC, iicoup ); // independent couplings, fixed for all events

The problem in this line of code is that it assumes that cIPC has a number of elements that is related to nicoup, but this is not true! Specifically, nicoup is 7, and some CPPProcess.cc have cIPC[14], but for the CPPProcess.cc which has cIPC[10], the loop should run on iicoup < 5 and not on iicoup < 7. I will try a fix.

If this makes the warning disappear, I would be reasonably confident that the rest of the code is correct. In any case allCOUP later on should normally not be used beyond. For good measure, also allCOUP should be dimensioned similar to cIPC.

Essentially, two variables are needed, nicoup (for Param.h, maximum across all P1 directories) and then some nicoup_used or similar (the one appropriate for this P1 directory, i.e. the one used for cIPC dimensions)

valassi commented 4 months ago

This is fixed in PR #824, closing for simplicity