`gpu-elegant` results not matching `elegant`

jeinstei commented 2 years ago

After some experimentation, we're finding that the output of gpu-elegant for certain configurations seems to be failing. In particular, the particles don't seem to make it through the initial beamline elements, with subtleties as to what is going wrong. We're not quite sure where it is failing yet, but @cchall has some examples (see attached). TL;DR, elegant is losing particles.

@cchall built some test files using a csbend component that highlight some of this.

I've rebuilt the newest version of elegant on gpu-jupyter (2021.4) and it shows the same behavior; the default install is 2021.1. I'm currently awaiting account activation for the elegant users' forum to post about the bug.

There's also a local build issue with elegant and CUDA 11 w/ GCC 11 that is known on other projects, likely due to a cstd versioning issue. I'm debugging that myself to build gpu-elegant locally on another system for testing as to if it is the version in our container that has issues or the code itself: c++17 vs c++14. I still need to build with c++17 instead as this requires changing some flags.

The (draft) build process generally follows the procedure in this Google doc, that is currently only available for RadiaSoft personnel: elegant build notes

All commands were run as $ <binary> tracking.ele > <logfile>.log 2>&1

2021.1 non-GPU elegant (ele.log):

tracking 4000 particles
3 May 22 16:42:38: This step establishes energy profile vs s (fiducial beam).
3 May 22 16:42:38: Rf phases/references reset.
4000 particles present after pass 0
...
Adding OCT_K after (null)
Adding OCT_K after (null)
Adding OCT_K after (null)
Adding OCT_K after (null)
...
4000 particles present after pass 4        
4000 particles transmitted, total effort of 16000 particle-turns
33776880 multipole kicks done

2021.1 gpu-elegant (gpu-standard-ele.log):

tracking 4000 particles
3 May 22 16:41:48: This step establishes energy profile vs s (fiducial beam).
3 May 22 16:41:48: Rf phases/references reset.
4000 particles present after pass 0 
...
0 particles present after pass 4        
0 particles transmitted, total effort of 0 particle-turns
18720 multipole kicks done

2021.4 gpu-elegant:

tracking 4000 particles
3 May 22 16:40:46: This step establishes energy profile vs s (fiducial beam).
3 May 22 16:40:46: Rf phases/references reset.
4000 particles present after pass 0
...
0 particles present after pass 4        
Post-tracking output completed.
Tracking step completed   ET:     00:00:01 CP:    1.43 BIO:0 DIO:0 PF:0 MEM:4838467

And an additional fun compilation message:

/home/vagrant/jupyter/oag/apps/src/epics/extensions/include/mdb.h:599: warning: "PI" redefined
  599 | #define PI 3.141592653589793
      | 
In file included from gpu_lsc.cu:1:
/home/vagrant/jupyter/oag/apps/src/epics/extensions/include/constants.h:35: note: this is the location of the previous definition
   35 | #define PI   3.141592653589793238462643

jeinstei commented 2 years ago

After posting on the elegant forums, it sounds like gpuElegant in a production environment is relatively high risk, and probably shouldn't be in production images due to issues with CUDA, simulation data matching, and others.

Issue: https://www3.aps.anl.gov/forums/elegant/viewtopic.php?p=5180

The reply I received stated that CUDA 11 has issues with it, and that using it requires first having a successful non-GPU simulation completed. I'm going to check around with the elegant users and see if anyone has an urgent need for it.

jeinstei commented 2 years ago

No response from the forums yet, but just was thinking about possible automated offloading using newer compilers, similar to how nersc does things: https://docs-dev.nersc.gov/cgpu/software/compilers/

It should be readily possible to have the standard codebase be written in a way to support such optimizations instead of needing a separate build toolchain with custom CUDA kernels

radiasoft / container-jupyter-nvidia

`gpu-elegant` results not matching `elegant` #13