Open SteveBronder opened 10 months ago
Awesome stuff!! Let me know if I can help in any way.
Ty! At this point it's mostly just brain storming nice patterns for all this. If you can look at https://github.com/stan-dev/stan/pull/3219 that is a PR we are waiting on before we can merge this
This is great! Looking forward to this!
Merging #1353 (51b3ba9) into master (743d0dd) will decrease coverage by
0.44%
. Report is 2 commits behind head on master. The diff coverage is87.50%
.
@@ Coverage Diff @@
## master #1353 +/- ##
==========================================
- Coverage 89.39% 88.95% -0.44%
==========================================
Files 65 65
Lines 10607 10814 +207
==========================================
+ Hits 9482 9620 +138
- Misses 1125 1194 +69
Files Changed | Coverage Δ | |
---|---|---|
src/frontend/Pretty_printing.ml | 91.08% <0.00%> (ø) |
|
src/middle/Mem_pattern.ml | 40.00% <16.66%> (-26.67%) |
:arrow_down: |
src/middle/Stmt.ml | 79.47% <66.66%> (ø) |
|
src/analysis_and_optimization/Mir_utils.ml | 77.38% <75.00%> (ø) |
|
src/frontend/Ast_to_Mir.ml | 94.19% <75.00%> (+0.01%) |
:arrow_up: |
src/middle/SizedType.ml | 79.77% <77.27%> (-4.75%) |
:arrow_down: |
src/analysis_and_optimization/Memory_patterns.ml | 84.17% <77.88%> (-6.41%) |
:arrow_down: |
src/middle/Index.ml | 82.35% <80.00%> (-0.41%) |
:arrow_down: |
src/stan_math_backend/Transform_Mir.ml | 95.16% <87.50%> (-0.58%) |
:arrow_down: |
src/stan_math_backend/Cpp.ml | 85.78% <91.66%> (-0.19%) |
:arrow_down: |
... and 14 more |
Submission Checklist
Summary
Creates a user exposed flag
-fopencl
that will attempt to promote log_prob for reverse mode such that we can run the entire thing on a GPU via OpenCL.This is about halfway there with a few things to figure out
The lower level C++ mir needs to be able to promote vectors and scalars so that they get moved over to the GPU. I think this means that we need to add a tag to arrays and scalars in the mir for a
Mem_pattern.t
. @WardBrian can you think of another way to do this that wouldn't require that? Would just be annoying since we would have to touch a lot of code.The optimization pass is all or none aka we either are able to move the entire
log_prob
over to the GPU or we go back to running only on the CPU. If we are doing this scheme then I think I also need to add a logger so that when a parameter fails we can give the user a reason as to why their model failed to be moved over to the GPU. (UPDATE: done but it's just a writer to stderr)The current scheme right now is a. Do the exact same pass as the SoA optimization using the monotone framework b. At the end check the compiler checks whether all the parameters declared in the parameters, transformed parameters, and model block are able to go on the GPU. If so then we are good and continue, otherwise we stop here and throw an error c. If (b) passes then we do another pass over those blocks collecting the names of all of the data used in the model d. Take the data names from (c) and in the data section add declarations and assignments of that data over to the GPU using
to_matrix_cl
like in the code below.I ripped out all the previous OpenCL code and my goal right now is just to get all of those tests compiling and working correctly.
I think it would be a good idea for now to leave the
target
on the CPU as when we write the scalar from the GPU to CPU that's a stopping point for the async opencl code to know it needs to finish before passing that scalar back.Mem_pattern.t
now has anOpenCL
type that indicates a statement or expression can be used on the GPU. For functions, every function in the math library that supports OpenCL also supports the new matrix type, but all functions that support the new matrix type do not support OpenCL. So for the table of available function signatures, if something is taggedOpenCL
we assume it can support bothSoA
andOpenCL
. For now I just tagged everything but before we merge and start testing I need to go through the math library and see which functions actually support OpenCLRelease notes
Allow
-fopencl
that performs a pass on log prob to attempt to promote the model to run on the GPU via OpenCLCopyright and Licensing
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)