madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 32 forks source link

Alpaka port #67

Closed roiser closed 1 year ago

smithdh commented 3 years ago

I've merged epoch1/alpaka code based on eemumu_AV alpaka_hw-abstraction-base-v0. However I think there is still some work to do: Currently the makefile is only setup to build a binary using OpenMP as backed. I think a CUDA build should also be setup to easily, possibly replacing the Makefiles with cmake and add automatic discover of CUDA, building it if available.

Currently the OpenMP version appears to produce a consistent (with original c++ version, random numbers are different) mean matrix element value, although performance appears to be low:

CPU, 32 hw threads with hyperthreading, CPU E5-2630 v3 @ 2.40GHz

bash-4.2$ ./check.exe -p 32 1 100000
***************************************
NumIterations             = 100000
NumThreadsPerBlock        = 1
NumBlocksPerGrid          = 32
---------------------------------------
FP precision              = DOUBLE (nan=0)
Complex type              = ALSIMPLE::COMPLEX
RanNumb memory layout     = AOSOA[1] == AOS
Momenta memory layout     = AOSOA[1] == AOS
Wavefunction GPU memory   = LOCAL
Rand type               = ALSIMPLE
Rand generation         = HOST
---------------------------------------
NumberOfEntries           = 100000
TotalTimeInWaveFuncs      = 2.753682e+00 sec
MeanTimeInWaveFuncs       = 2.753682e-05 sec
StdDevTimeInWaveFuncs     = 1.891695e-05 sec
MinTimeInWaveFuncs        = 2.376400e-05 sec
MaxTimeInWaveFuncs        = 5.748493e-03 sec
---------------------------------------
TotalEventsComputed       = 3200000
RamboEventsPerSec         = 6.576978e+05 sec^-1
MatrixElemEventsPerSec    = 1.162080e+06 sec^-1
***************************************
NumMatrixElements(notNan) = 3200000
MeanMatrixElemValue       = 1.371250e-02 GeV^0
StdErrMatrixElemValue     = 4.581855e-06 GeV^0
StdDevMatrixElemValue     = 8.196271e-03 GeV^0
MinMatrixElemValue        = 6.071582e-03 GeV^0
MaxMatrixElemValue        = 3.374923e-02 GeV^0
***************************************
00 CudaFree : 0.000014 sec
0a ProcInit : 0.000531 sec
0b MemAlloc : 0.029286 sec
0c GenCreat : 0.000039 sec
1a GenSeed  : 1.585944 sec
1b GenRnGen : 0.974015 sec
1c CpHTDrnd : 0.316440 sec
2a RamboIni : 2.087273 sec
2b RamboFin : 2.118214 sec
2c CpDTHwgt : 0.315820 sec
2d CpDTHmom : 0.344167 sec
3a SGoodHel : 0.005096 sec
3b SigmaKin : 2.438134 sec
3c CpDTHmes : 0.315539 sec
4a DumpLoop : 0.189560 sec
9a DumpAll  : 0.016396 sec
9b GenDestr : 0.000004 sec
9c MemFree  : 0.002393 sec
9d CudReset : 0.000012 sec
TOTAL       : 10.738873 sec
TOTAL(123)  : 10.500641 sec
TOTAL(23)   : 7.624243 sec
TOTAL(3)    : 2.758769 sec
***************************************

standalone_cpp:
bash-4.2$ ./check.exe 3200000
***********************************
NumberOfEntries       = 3200000
TotalTimeInWaveFuncs  = 8.518257e+00 sec
MeanTimeInWaveFuncs   = 2.661955e-06 sec
StdDevTimeInWaveFuncs = 6.033766e-07 sec
MinTimeInWaveFuncs    = 2.579000e-06 sec
MaxTimeInWaveFuncs    = 9.373900e-04 sec
-----------------------------------
NumMatrixElements     = 3200000
MatrixElementsPerSec  = 3.756637e+05 sec^-1
***********************************
NumMatrixElements     = 3200000
MeanMatrixElemValue   = 1.371640e-02 GeV^0
StdErrMatrixElemValue = 4.582625e-06 GeV^0
StdDevMatrixElemValue = 8.197649e-03 GeV^0
MinMatrixElemValue    = 6.071582e-03 GeV^0
MaxMatrixElemValue    = 3.374922e-02 GeV^0
smithdh commented 3 years ago

(So will keep this ticket open for now, and make further PRs to address proposals in comment above).

valassi commented 1 year ago

As suggested by @roiser this can now be closed - we have an initial Alpaka implementation, then we'll see later how to make this evolve. Thanks @smithdh !