[WIP] Expose OpenCL optimization pass

SteveBronder commented 10 months ago

Submission Checklist

[ ] Run unit tests
Documentation
- [ ] If a user-facing facing change was made, the documentation PR is here:
- [ ] OR, no user-facing changes were made

Summary

Creates a user exposed flag -fopencl that will attempt to promote log_prob for reverse mode such that we can run the entire thing on a GPU via OpenCL.

This is about halfway there with a few things to figure out

The lower level C++ mir needs to be able to promote vectors and scalars so that they get moved over to the GPU. I think this means that we need to add a tag to arrays and scalars in the mir for a Mem_pattern.t. @WardBrian can you think of another way to do this that wouldn't require that? Would just be annoying since we would have to touch a lot of code.
The optimization pass is all or none aka we either are able to move the entire log_prob over to the GPU or we go back to running only on the CPU. If we are doing this scheme then I think I also need to add a logger so that when a parameter fails we can give the user a reason as to why their model failed to be moved over to the GPU. (UPDATE: done but it's just a writer to stderr)
The current scheme right now is a. Do the exact same pass as the SoA optimization using the monotone framework b. At the end check the compiler checks whether all the parameters declared in the parameters, transformed parameters, and model block are able to go on the GPU. If so then we are good and continue, otherwise we stop here and throw an error c. If (b) passes then we do another pass over those blocks collecting the names of all of the data used in the model d. Take the data names from (c) and in the data section add declarations and assignments of that data over to the GPU using to_matrix_cl like in the code below.
```
matrix_cl<{TYPE}> {DATA_NAME}_opencl__ = to_matric_cl({DATA_NAME});
```

I ripped out all the previous OpenCL code and my goal right now is just to get all of those tests compiling and working correctly.

I think it would be a good idea for now to leave the target on the CPU as when we write the scalar from the GPU to CPU that's a stopping point for the async opencl code to know it needs to finish before passing that scalar back.
Mem_pattern.t now has an OpenCL type that indicates a statement or expression can be used on the GPU. For functions, every function in the math library that supports OpenCL also supports the new matrix type, but all functions that support the new matrix type do not support OpenCL. So for the table of available function signatures, if something is tagged OpenCL we assume it can support both SoA and OpenCL. For now I just tagged everything but before we merge and start testing I need to go through the math library and see which functions actually support OpenCL

Release notes

Allow -fopencl that performs a pass on log prob to attempt to promote the model to run on the GPU via OpenCL

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

rok-cesnovar commented 10 months ago

Awesome stuff!! Let me know if I can help in any way.

SteveBronder commented 10 months ago

Ty! At this point it's mostly just brain storming nice patterns for all this. If you can look at https://github.com/stan-dev/stan/pull/3219 that is a PR we are waiting on before we can merge this

andrjohns commented 10 months ago

This is great! Looking forward to this!

codecov[bot] commented 10 months ago

Codecov Report

Merging #1353 (51b3ba9) into master (743d0dd) will decrease coverage by 0.44%. Report is 2 commits behind head on master. The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master    #1353      +/-   ##
==========================================
- Coverage   89.39%   88.95%   -0.44%     
==========================================
  Files          65       65              
  Lines       10607    10814     +207     
==========================================
+ Hits         9482     9620     +138     
- Misses       1125     1194      +69

Files Changed	Coverage Δ
src/frontend/Pretty_printing.ml	`91.08% <0.00%> (ø)`
src/middle/Mem_pattern.ml	`40.00% <16.66%> (-26.67%)`	:arrow_down:
src/middle/Stmt.ml	`79.47% <66.66%> (ø)`
src/analysis_and_optimization/Mir_utils.ml	`77.38% <75.00%> (ø)`
src/frontend/Ast_to_Mir.ml	`94.19% <75.00%> (+0.01%)`	:arrow_up:
src/middle/SizedType.ml	`79.77% <77.27%> (-4.75%)`	:arrow_down:
src/analysis_and_optimization/Memory_patterns.ml	`84.17% <77.88%> (-6.41%)`	:arrow_down:
src/middle/Index.ml	`82.35% <80.00%> (-0.41%)`	:arrow_down:
src/stan_math_backend/Transform_Mir.ml	`95.16% <87.50%> (-0.58%)`	:arrow_down:
src/stan_math_backend/Cpp.ml	`85.78% <91.66%> (-0.19%)`	:arrow_down:
... and 14 more

... and 1 file with indirect coverage changes

stan-dev / stanc3