This repo contains a new compiler for Stan, stanc3, written in OCaml. Since version 2.26, this has been the default compiler for Stan. See this wiki for a list of minor differences between this compiler and the previous Stan compiler.
To read more about why we built this, see this introductory blog post. For some discussion as to how we chose OCaml, see this accidental flamewar. We're testing these models (listed under Test Results) on every pull request.
Documentation for users of stanc3 is in the Stan Users' Guide here
The Stanc3 Developer documentation is available here: https://mc-stan.org/stanc3/stanc
Want to contribute? See Getting Started for setup instructions and some useful commands.
Stanc3 has 4 main src packages: frontend
, middle
, analysis_and_optimization
and stan_math_backend
.
flowchart
Stanc --> Frontend & Analysis & Backend <-.-> Middle
The goal is to keep as many details about the way Stan is implemented by the core C++ implementation in the Stan Math backend library as possible.
The Middle library contains the MIR and currently any types or functions used by the two ends.
The entrypoint for the compiler is in src/stanc/stanc.ml
which sequences the various components together.
The phases of stanc are summarized in the following information flowchart and list.
flowchart TB
subgraph frontend[Frontend]
direction TB
infile>Source file]
lexer(frontend/lexer.mll)
parser(frontend/parser.mly)
typecheck(frontend/Typechecker.ml)
lower(frontend/Ast_to_Mir.ml)
infile --> lexer -->|Tokens| parser
parser -->|Untyped AST| typecheck -->|Typed AST| lower
end
subgraph middle[Middle Representation]
data{{MIR Data Structures}}
end
subgraph analysis[Static Analysis and Optimization]
optimize(analysis_and_optimization/Optimize.ml)
end
subgraph backend[Backend]
codegen(*_backend/*_code_gen.ml)
transform(*_backend/Transform_Mir.ml)
transform -.->|MIR with backend specific code| optimize
transform --> codegen
optimize -->|Optimized MIR| codegen
end
outfile>Output File, e.g. a .hpp]
middle --- analysis
frontend ==> middle =====> backend ==> outfile
click lexer "https://github.com/stan-dev/stanc3/blob/master/src/frontend/lexer.mll"
click parser "https://github.com/stan-dev/stanc3/blob/master/src/frontend/parser.mly"
click typecheck "https://github.com/stan-dev/stanc3/blob/master/src/frontend/Typechecker.ml"
click lower "https://github.com/stan-dev/stanc3/blob/master/src/frontend/Ast_to_Mir.ml"
click optimize "https://github.com/stan-dev/stanc3/blob/master/src/analysis_and_optimization/Optimize.ml"
click data "https://github.com/stan-dev/stanc3/tree/master/src/middle"
click codegen "https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Stan_math_code_gen.ml"
click transform "https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Transform_Mir.ml"
stanc --debug-ast
to print this out.stanc --debug-decorated-ast
stanc --debug-mir
(or --debug-mir-pretty
)stanc --debug-transformed-mir
src/frontend/Ast.ml
defines the AST. The AST is intended to have a direct 1-1 mapping with the syntax, so there are things like parentheses being kept around.
The pretty-printer in the frontend uses the AST and attempts to keep user syntax the same while just adjusting whitespace.
The AST uses a particular functional programming trick to add metadata to the AST (and its other tree types), sometimes called the "two-level types" pattern. Essentially, many of the tree variant types are parameterized by something that ends up being a placeholder not for just metadata but for the recursive type including metadata, sometimes called the fixed point. So instead of recursively referencing expression
you would instead reference type parameter 'e
, which will later be filled in with something like type expr_with_meta = metadata expression
.
The AST intends to keep very close to Stan-level semantics and syntax in every way.
src/middle/Program.ml
contains the MIR (Middle Intermediate Language). src/frontend/Ast_to_Mir.ml
performs the lowering and attempts to strip out as much Stan-specific semantics and syntax as possible, though this is still something of a work-in-progress.
The MIR uses the same two-level types idea to add metadata, notably expression types and autodiff levels as well as locations on many things. The MIR is used as the output data type from the frontend and the input for dataflow analysis, optimization (which also outputs MIR), and code generation.
src/stan_math_backend/Cpp.ml
defines a minimal representation of C++ used in code generation.
This is intentionally simpler than both the above structures and than a true C++ AST and is tailored pretty specifically to the C++ generated in our model class.