Code generation streamlining

ulysseB / telamon

A framework to find good combinations of optimizations for computational kernels on GPUs.

Apache License 2.0

23 stars 6 forks source link

The code generation currently works in two steps: first, we generate a loop tree from the CSP solution; then, we walk over this loop tree, printing code on the fly. This has advantages in terms of conceptual complexity; however, it has some issues -- notably, it leads to a high level of code duplication between backends (because the printing structure is not always exactly identical); and it makes it painful to materialise additional code such as that required for convolutions or predicated loads. In order to solve those issues, we propose to introduce a low-level IR that sits on top of the backends' printed code; with the goal to add an extra step to the code generation process: the loop tree should first be converted to this new IR, which contains lowered instructions and implementation details (for instance, this IR should be explicit about loop indices and induction variables; in fact, the resulting code should be mostly independent from the high-level Telamon concepts); then, the IR can be converted to actual backend code.

This patch is a first step towards that direction. It introduces a (very) bare-bones version of the concept in the codegen::llir module (low-level IR, in contrast to the higher level existing ir module), which is meant to define a more traditional IR defined for fixed candidates. Currently, the llir module defines abstract types to represent registers (named variables which can be written to) and instruction operands; the registers and operands are given as argument to the InstPrinter methods instead of raw strings. Finally, it provides a ScalarOrVector type to represent vectors of either registers or operands.

The introduction of these types allow to decouple some parts of the code, notably:

The NameMap is freed from the responsibility to know about the target's literal formatting to generate strings; instead, it can generate registers or operands as appropriate. This allows to completely remove the get_const_float and get_const_int methods from the ValuePrinter trait, instead passing on that responsibility to the InstPrinter when it sees the corresponding operands.
In a similar vein, a trip through the InstPrinter is no longer needed for vectorization -- instead, the NameMap can directly return a vector of registers (or operands) which is then given to the InstPrinter, and can be printed through the regular printing path.

To ease the transition, this patch also makes some additional changes; notably:

The ValuePrinter is renamed to NameGenerator, since its only responsibility is now to generate names for variables and parameters. The NameGenerator currently still goes through a trait and is implemented separately by each backend; however, the restructuration should make it easy to use an unified NameGenerator type in a subsequent patch. This is made possible by the indirection through the llir::Register type for printing, which allows backends that need it to add (type-based) prefixes or suffixes at printing time.
The NameMap now uses an arena instead of raw strings. This allows returning long-lived references to register names; this greatly simplifies usage of the NameMap due to eliminating most of the mutable lifetime conflicts.
The printing of registers and operands goes through the newly introduced PTXDisplay and C99Display traits, which are similar in spirit to the fmt::Display trait but formats values according to PTX (resp. C99) syntax. They are used by the InstPrinter implementations.
Finally, as a preparation for future patches, the helper methods in the InstPrinter trait have been extracted to helper structures instead; this is in a first step towards re-unifying most of the actualy code generation between the different backends to end up with a single list of a (yet-to-be-introduced) llir::Instructions to be given to the backend instead.

Conceptually, this patch doesn't do much things expect change various type representations; as such, the reader is encoureged to first look through the codegen::llir module to read about the new types being introduced; the changes to codegen::name_map and codegen::printer should then be fairly straightforward (but pay attention to vector_operand and vector_inst which have been moved from the InstPrinter). Finally, the changes to the backend printers should also be mostly straightforward, merely updating to the new API; the only real changes being the introduction of the C99Display and PTXDisplay traits used to display registers and operands.

* The `NameMap` is freed from the responsibility to know about the target's literal formatting to generate strings; instead, it can generate registers or operands as appropriate. This allows to completely remove the `get_const_float` and `get_const_int` methods from the `ValuePrinter` trait, instead passing on that responsibility to the `InstPrinter` when it sees the corresponding operands. * In a similar vein, a trip through the `InstPrinter` is no longer needed for vectorization -- instead, the `NameMap` can directly return a vector of registers (or operands) which is then given to the `InstPrinter`, and can be printed through the regular printing path.

* The `ValuePrinter` is renamed to `NameGenerator`, since its only responsibility is now to generate names for variables and parameters. The `NameGenerator` currently still goes through a trait and is implemented separately by each backend; however, the restructuration should make it easy

to use an unified `NameGenerator` type in a subsequent patch. This is made possible by the indirection through the `llir::Register` type for printing, which allows backends that need it to add (type-based) prefixes or suffixes at printing time. * The `NameMap` now uses an arena instead of raw strings. This allows returning long-lived references to register names; this greatly simplifies usage of the `NameMap` due to eliminating most of the mutable lifetime conflicts. * The printing of registers and operands goes through the newly introduced `PTXDisplay` and `C99Display` traits, which are similar in spirit to the `fmt::Display` trait but formats values according to PTX (resp. C99) syntax. They are used by the `InstPrinter` implementations. * Finally, as a preparation for future patches, the helper methods in the `InstPrinter` trait have been extracted to helper structures instead; this is in a first step towards re-unifying most of the actualy code generation

ulysseB / telamon

Code generation streamlining #297