madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 33 forks source link

Cuda time profiles for DY+4j have very high 'HEL' component for helicity filtering? #999

Open valassi opened 3 weeks ago

valassi commented 3 weeks ago

Documenting/Analysing further results of DY+4jet tests in #948

Cuda time profiles for DY+4j have very high 'HEL' component for helicity filtering?

pp_dy4j.mad/fortran/output.txt (#events: 81)
[GridPackCmd.launch] OVERALL TOTAL    21707.6095 seconds
[madevent COUNTERS]  PROGRAM TOTAL    21546.1
[madevent COUNTERS]  Fortran Overhead 1579.09
[madevent COUNTERS]  Fortran MEs      19967
--------------------------------------------------------------------------------
pp_dy4j.mad/cppnone/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    26745.1639 seconds
[madevent COUNTERS]  PROGRAM TOTAL    26584.9
[madevent COUNTERS]  Fortran Overhead 1608.51
[madevent COUNTERS]  CudaCpp MEs      24910.4
[madevent COUNTERS]  CudaCpp HEL      66.0341
--------------------------------------------------------------------------------
pp_dy4j.mad/cppsse4/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    14398.4664 seconds
[madevent COUNTERS]  PROGRAM TOTAL    14231.3
[madevent COUNTERS]  Fortran Overhead 1647.03
[madevent COUNTERS]  CudaCpp MEs      12550.6
[madevent COUNTERS]  CudaCpp HEL      33.7035
--------------------------------------------------------------------------------
pp_dy4j.mad/cppavx2/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    7335.2356 seconds
[madevent COUNTERS]  PROGRAM TOTAL    7114.43
[madevent COUNTERS]  Fortran Overhead 1683.7
[madevent COUNTERS]  CudaCpp MEs      5415.48
[madevent COUNTERS]  CudaCpp HEL      15.2596
--------------------------------------------------------------------------------
pp_dy4j.mad/cpp512y/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    6831.8971 seconds
[madevent COUNTERS]  PROGRAM TOTAL    6649.98
[madevent COUNTERS]  Fortran Overhead 1669.94
[madevent COUNTERS]  CudaCpp MEs      4966.24
[madevent COUNTERS]  CudaCpp HEL      13.8066
--------------------------------------------------------------------------------
pp_dy4j.mad/cpp512z/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    7136.2962 seconds
[madevent COUNTERS]  PROGRAM TOTAL    6958.96
[madevent COUNTERS]  Fortran Overhead 1636.28
[madevent COUNTERS]  CudaCpp MEs      5305.14
[madevent COUNTERS]  CudaCpp HEL      17.5447
--------------------------------------------------------------------------------
pp_dy4j.mad/cuda/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    2523.7488 seconds
[madevent COUNTERS]  PROGRAM TOTAL    2234.93
[madevent COUNTERS]  Fortran Overhead 1820.36
[madevent COUNTERS]  CudaCpp MEs      97.9622
[madevent COUNTERS]  CudaCpp HEL      316.613
--------------------------------------------------------------------------------

Specifically, the 316sec which is 3 times the 97s does not make any sense...

There was an issue in #958 but this should have been fixed by now?