madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 32 forks source link

Move latest eemumu developments from epoch1 to epoch2 ("merge" epoch2 into epoch1) #139

Closed valassi closed 3 years ago

valassi commented 3 years ago

Hi @roiser @oliviermattelaer @hageboeck

All my latest developments (master, klas/vectorization, heterogeneous, unweighting etc) are all in epoch1/eemumu. It was my mistake not to base this on epoch2/eemumu.

Luckily, the changes that have been made in epoch2/eemumu since its initial creation are relatively few.

As agree with @roiser on Friday, I will essentially merge those epoch2 changes into epoch1. I proposed to also upgrade epoch2 to the level of epoch1, but we agreed that it is better to use epoch1/eemumu as the basis, and eventually use this to backport to MG and create epoch3.

We also agreed that I will self-merge essentially all of this stuff, as there is quite a bit to be done. I will try to document here what I am doing and why.

valassi commented 3 years ago

There are essentially two things to be ported

  1. The initial differences between epoch2 and epoch1 when epoch2 was created.
  2. The new changes on top of epoch2 over time.

For reference, the latter are these, quite few

[avalassi@itscrd70 bash] ~/GPU2020/madgraph4gpuTer/epoch2/cuda/ee_mumu> git diff upstream/master [avalassi@itscrd70 bash] ~/GPU2020/madgraph4gpuTer/epoch2/cuda/ee_mumu> git log .

commit ff021bdfafa5068b841fbd1754052524270301b2 Author: Stephan Hageboeck stephan.hageboeck@cern.ch Date: Wed Dec 16 18:22:26 2020 +0100

[ep2 cuda eemm] Port fixes in Makefile to epoch2.

commit 57497a79292cf0616e04ab9b866bba305eb93f54 Author: Stephan Hageboeck stephan.hageboeck@cern.ch Date: Wed Dec 16 17:05:24 2020 +0100

[ep2 cuda eemm] Port CUDA tests to epoch2.

commit 05d3a6f8de663db276b4755785d0713b27b043bd Author: Olivier Mattelaer olivier.mattelaer@uclouvain.be Date: Wed Dec 2 13:52:59 2020 +0100

port into MG5aMC the change from https://github.com/madgraph5/madgraph4gpu/pull/78

commit a6c18e2715bfa5b39727ba6407031f6c7633ab78 Author: Olivier Mattelaer olivier.mattelaer@uclouvain.be Date: Mon Nov 30 23:26:50 2020 +0100

cpp compilation is working

commit a683a247e13d3aedc88f62c7d3f20aefde6943d5 Author: Olivier Mattelaer olivier.mattelaer@uclouvain.be Date: Sun Nov 29 21:38:08 2020 +0100

fix issue with ixxxx

commit 389aaaa72343ad05f83168bc4fbd390ccee013e0 Author: Olivier Mattelaer olivier.mattelaer@uclouvain.be Date: Fri Nov 27 10:28:45 2020 +0100

adding json info/ more plot from PR#61

commit c092e9a053e7f037791f25462fcf8598232fab49 Author: Olivier Mattelaer olivier.mattelaer@uclouvain.be Date: Thu Nov 26 20:59:28 2020 +0100

first version of ee_mumu coming from madgraph --some PR still need to be included here
valassi commented 3 years ago

I merged the first batch of changes from PR #140 : clean up and rename files in epoch2/eemumu.

These changes are all in epoch2, essentially:

Running EVENTUALLY-TODO:

valassi commented 3 years ago

I have decided to split the remaining tasks further into two PR. I have done everything except CPPProcess, but this is the most complex part (and I actually even see a minor performance differences). I will split that out in a third PR.

Recap about issue #139

More in detail about this PR #149 below, copied from the text of the PR.


In src:

1) Parameters_sm.h Remove "using namespace std;" in epoch2. Otherwise almost identical. Copy epoch2 to epoch1.

2) Parameters_sm.cc Add explicit std:: in epoch2. Otherwise almost identical. Copy epoch2 to epoch1.

3) read_slha.h Identical but for indentation: fix them manually and make them equal. (clang-format would bring too many changes)

4) read_slha.cc Identical but for a default parameter value in implementation in epoch2. Fix by copying epoch1 to epoch2.

5) rambo.h/cc Identical in epoch2 and epoch1, nothing to do

6) mgOnGpuConfig.h Identical, except for a comment (did the percent sign disturn the metacode?). Fix by copying epoch1 to epoch2.

7) mgOnGpuTypes.h Identical.

8) Makefile Almost identical, but ep1 has OMP, fastmath, Wextra. Fix by copying epoch1 to epoch2.

9) HelAmps.h/cc MISSING IN EPOCH1! Do this later...


In SubProcesses and below:

1) timer.h Identical

2) Makefile Almost identical but epoch1 has much more, cosmetics and copy ep1 to ep2 Now added also to ep2, as in epoch1: OMP, fastmath, Wextra, clang patch, host info

Note: at this stage, epoch1 is slightly faster than epoch2 in c++, but the inverse in CUDA.

3) Memory.h, nvtx.h, perf.py Identical (but a symlink is missing, to be added in epoch1)

4) timermap.h Copy epoch1 to epoch2 to add missing gcc pragmas for nvtx warnings

5) perf/data Only in epoch1 - one json file, keep it there

6) profile.sh Only in epoch1 - should bring it forward eventually (anyway the basis will be epoch1)

7) runTest.cc Initially identical, but tests had different name (e.g. EP1_CUDA_GPU vs EP2_CUDA_GPU). This is fixed by adding epoch_process_id.h where a different macro is defined per epoch, then runTest.cc is now identical.

8) check.cc

First batch of changes

Minimal changes in epoch1:

Port to epoch2 many changes from epoch1:

7bis) runTest.cc 8bis) check.cc

A large batch of additional changes (mainly in PR #144) came from fixing epoch2 check.cc to use fptype for random numbers as in epoch1. This triggered many additional checks about single precision, included in PR #144, which also includes a better treatment of NaNs.


This is all at the time of this PR (after some previous ones). Then the rest will be about CPPProcess.

valassi commented 3 years ago

I have FINALLY completed also the third big part, PR #151. Now epoch1 and epoch2 (before vectorization) are strictly identical. I will then go on and develop on top of epoch1 (vectorization and more), while keeping epoch2 as a pre-vectorization reference.

This is a summary of what this contains:

In general

Changes in file directory structure

Changes in file formatting, cosmetics, minor content issues

Code cleanup potentially affecting performance

Changes in XXX functions

Changes in FFV functions

Changes in sigmakin or calculate_wavefunction

Changes in CPPProcess other than XXX, FFV or formatting

Printouts and performance tools

TODO EVENTUALLY (after vectorization: add to the running list from previous PRs)

I will now self-merge that.

valassi commented 3 years ago

I have merged #151.

I keep this PR #139 because I'd like to do some cleanup after merging vectorization. From bits and piecse of my previous TODO EVENTUALLY:

My CURRENT BASELINE PERFORMANCE (before vectorization) is described in https://github.com/madgraph5/madgraph4gpu/commit/1c25007f59488e86c87f3e4d46043f09a140aafd

-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MatrixElems] (3) = ( 1.133317e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     8.050711 sec
real    0m8.079s
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CUDA
EvtsPerSec[MatrixElems] (3) = ( 6.852279e+08                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     1.233023 sec
real    0m1.552s
==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CPP
OMP threads / `nproc --all` = 1 / 4
EvtsPerSec[MatrixElems] (3) = ( 1.132827e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     8.059035 sec
real    0m8.086s
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CUDA
EvtsPerSec[MatrixElems] (3) = ( 6.870531e+08                 )  sec^-1
MeanMatrixElemValue         = ( 1.372152e-02 +- 3.269516e-06 )  GeV^0
TOTAL       :     1.177079 sec
real    0m1.485s
==PROF== Profiling "_ZN5gProc8sigmaKinEPKdPd": launch__registers_per_thread 164
-------------------------------------------------------------------------
valassi commented 3 years ago

The bulk of the tasks described here were completed long ago.

The pending items were also essentially all completed in one way or another in epochX (issue #244).

Answering point by point on my own latest comment:

This issue can now be closed