stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).
BSD 3-Clause "New" or "Revised" License
138 stars 44 forks source link

[Refactor] Add a lower-level IR for C++ code generation #1233

Closed WardBrian closed 1 year ago

WardBrian commented 1 year ago

Closes #850, #354.

This PR introduces a relatively simple[^1] C++ representation for the code generation backend.

Rather than code generating by creating strings directly, we first lower the MIR to this representation and then print it out.

Benefits:

  1. Code generation is now a mostly-pure computation. With the exception of a few things like functor generation and tracking map-rect calls, there are no side effects in the code gen. Each function takes in some chunk of input and returns some portion of the C++ datatype.
  2. Separation of the code-generation logic and the pretty-printing. This means formatting is more consistent, and makes it impossible for some part of the code gen to miss a semicolon where other parts wouldn't. As an added plus, we don't need to pass printing callbacks around to get reasonable formatting of things like method bodies.
  3. Ability to re-use code and write expressively. For example, we can write OCaml code like
    let open Expression_syntax in
    let open Types in
    let vector = Constructor (row_vector Double, [Literal "3"]) in
    let values = [Literal "1"; Var "a"; Literal "3"] in
    (vector << values).@!("finished")

    which gets turned into the C++

    (Eigen::Matrix<double,1,-1>(3) << 1, a, 3).finished()

    I tried not to go over board on making syntax for this, it is more or less limited to doing method calls (the alternative would be (MethodCall (Parens (StreamInsertion (Constructor (row_vector Double, [Literal "3"]), [Literal "1"; Var "a"; Literal "3"]), "finished", [], []), which moves all the least-important information to the front of the code).

There is a lot of potential to add further processing of this generated C++ before it is stringified and output, but that is not part of this PR.

The code generated by this PR is intended to be semantically identical; whitespace, the ordering of some functions, and other superficial changes are common however. I recommend viewing the diffs with whitespace changes hidden. I have eyeballed all of them, but this PR is the first time some of it will actually be compiled to executable.

[^1]: "Relatively simple" in that it 1) does not try to capture all of C++, just the portion of C++ we currently generate and 2) is a basic recursive data type, no two-level-type reasoning is needed at this level

Submission Checklist

Release notes

Refactored how C++ is code-generated in the compiler.

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

codecov[bot] commented 1 year ago

Codecov Report

Merging #1233 (8aac0dd) into master (2044426) will decrease coverage by 1.28%. The diff coverage is 88.10%.

:exclamation: Current head 8aac0dd differs from pull request most recent head 55eb5dd. Consider uploading reports for the commit 55eb5dd to get more accurate results

@@            Coverage Diff             @@
##           master    #1233      +/-   ##
==========================================
- Coverage   88.49%   87.20%   -1.29%     
==========================================
  Files          63       64       +1     
  Lines        9262     9687     +425     
==========================================
+ Hits         8196     8448     +252     
- Misses       1066     1239     +173     
Impacted Files Coverage Δ
src/stan_math_backend/Transform_Mir.ml 96.61% <ø> (+1.29%) :arrow_up:
src/stan_math_backend/Cpp.ml 66.48% <66.48%> (ø)
src/stan_math_backend/Lower_expr.ml 90.00% <90.00%> (ø)
src/stan_math_backend/Lower_stmt.ml 95.31% <95.31%> (ø)
src/stan_math_backend/Lower_program.ml 98.69% <98.69%> (ø)
src/stan_math_backend/Lower_functions.ml 99.49% <99.49%> (ø)
src/stan_math_backend/Cpp_Json.ml 100.00% <100.00%> (ø)
src/stan_math_backend/Locations.ml 100.00% <100.00%> (ø)
src/stanc/stanc.ml 84.16% <100.00%> (ø)
src/middle/SizedType.ml 65.67% <0.00%> (-13.61%) :arrow_down:
... and 16 more
WardBrian commented 1 year ago

@nhuurre - I talked to @SteveBronder and he doesn't have the bandwidth to review this soon. Do you feel comfortable reviewing a PR in this area, and would you mind?

WardBrian commented 1 year ago

You say that this closes https://github.com/stan-dev/stanc3/issues/354 but that issue is about MIR dump format, something this PR does not touch at all.

My bad, I thought that was about the (nearly identical) issue we currently have in the C++ code gen for printing a lot of empty cuts.

map_rect

Giving that suggestion a try now

WardBrian commented 1 year ago

@nhuurre I quite like the result of moving map_rect into a pre-processing phase. The end result is that we shouldn't have any global state left in the code gen, it should all be within the scope of one function/module

WardBrian commented 1 year ago

Thanks for the thorough review @nhuurre!

@SteveBronder - I believe merging will still be blocked so long as you have 'requested changes'. Do you mind clearing that?