ulysseB / telamon

A framework to find good combinations of optimizations for computational kernels on GPUs.
https://ulysseb.github.io/telamon/telamon
Apache License 2.0
23 stars 6 forks source link

Code generation streamlining #297

Closed Elarnon closed 5 years ago

Elarnon commented 5 years ago

The code generation currently works in two steps: first, we generate a loop tree from the CSP solution; then, we walk over this loop tree, printing code on the fly. This has advantages in terms of conceptual complexity; however, it has some issues -- notably, it leads to a high level of code duplication between backends (because the printing structure is not always exactly identical); and it makes it painful to materialise additional code such as that required for convolutions or predicated loads. In order to solve those issues, we propose to introduce a low-level IR that sits on top of the backends' printed code; with the goal to add an extra step to the code generation process: the loop tree should first be converted to this new IR, which contains lowered instructions and implementation details (for instance, this IR should be explicit about loop indices and induction variables; in fact, the resulting code should be mostly independent from the high-level Telamon concepts); then, the IR can be converted to actual backend code.

This patch is a first step towards that direction. It introduces a (very) bare-bones version of the concept in the codegen::llir module (low-level IR, in contrast to the higher level existing ir module), which is meant to define a more traditional IR defined for fixed candidates. Currently, the llir module defines abstract types to represent registers (named variables which can be written to) and instruction operands; the registers and operands are given as argument to the InstPrinter methods instead of raw strings. Finally, it provides a ScalarOrVector type to represent vectors of either registers or operands.

The introduction of these types allow to decouple some parts of the code, notably:

To ease the transition, this patch also makes some additional changes; notably:

Conceptually, this patch doesn't do much things expect change various type representations; as such, the reader is encoureged to first look through the codegen::llir module to read about the new types being introduced; the changes to codegen::name_map and codegen::printer should then be fairly straightforward (but pay attention to vector_operand and vector_inst which have been moved from the InstPrinter). Finally, the changes to the backend printers should also be mostly straightforward, merely updating to the new API; the only real changes being the introduction of the C99Display and PTXDisplay traits used to display registers and operands.

andidr commented 5 years ago

The code generation currently works in two steps: first, we generate a loop tree from the CSP solution; then, we walk over this loop tree, printing code on the fly. This has advantages in terms of conceptual complexity; however, it has some issues -- notably, it leads to a high level of code duplication between backends (because the printing structure is not always exactly identical); and it makes it painful to materialise additional code such as that required for convolutions or predicated loads.

Newline here

In order to solve those issues, we propose to introduce a low-level IR that sits on top of the backends' printed code; with the goal to add an extra step to the code generation process: the loop tree should first be converted to this new IR, which contains lowered instructions and implementation details (for instance, this IR should be explicit about loop indices and induction variables; in fact, the resulting code should be mostly independent from the high-level Telamon concepts); then, the IR can be converted to actual backend code.

Slightly excessive use of ; ;)

This patch is a first step towards that direction. It introduces a (very) bare-bones version of the concept in the codegen::llir module (low-level IR, in contrast to the higher level existing ir module), which is meant to define a more traditional IR defined for fixed candidates. Currently, the llir module defines abstract types to represent registers (named variables which can be written to) and instruction operands; the registers and operands are given as argument to the InstPrinter methods instead of raw strings. Finally, it provides a ScalarOrVector type to represent vectors of either registers or operands.

The introduction of these types allow to decouple some parts of the code,

allows

notably:

* The `NameMap` is freed from the responsibility to know about the target's
  literal formatting to generate strings; instead, it can generate registers
  or operands as appropriate.  This allows to completely remove the
  `get_const_float` and `get_const_int` methods from the `ValuePrinter` trait,
  instead passing on that responsibility to the `InstPrinter` when it sees the
  corresponding operands.

* In a similar vein, a trip through the `InstPrinter` is no longer needed for
  vectorization -- instead, the `NameMap` can directly return a vector of
  registers (or operands) which is then given to the `InstPrinter`, and can be
  printed through the regular printing path.

To ease the transition, this patch also makes some additional changes; notably:

* The `ValuePrinter` is renamed to `NameGenerator`, since its only
  responsibility is now to generate names for variables and parameters.  The
  `NameGenerator` currently still goes through a trait and is implemented
  separately by each backend; however, the restructuration should make it easy

restructuring

  to use an unified `NameGenerator` type in a subsequent patch.  This is made
  possible by the indirection through the `llir::Register` type for printing,
  which allows backends that need it to add (type-based) prefixes or suffixes
  at printing time.

* The `NameMap` now uses an arena instead of raw strings.  This allows
  returning long-lived references to register names; this greatly simplifies
  usage of the `NameMap` due to eliminating most of the mutable lifetime
  conflicts.

* The printing of registers and operands goes through the newly introduced
  `PTXDisplay` and `C99Display` traits, which are similar in spirit to the
  `fmt::Display` trait but formats values according to PTX (resp. C99) syntax.
  They are used by the `InstPrinter` implementations.

* Finally, as a preparation for future patches, the helper methods in the
  `InstPrinter` trait have been extracted to helper structures instead; this
  is in a first step towards re-unifying most of the actualy code generation

actual

  between the different backends to end up with a single list of a
  (yet-to-be-introduced) `llir::Instruction`s to be given to the backend instead.

Conceptually, this patch doesn't do much things expect change various type

many things / much, except

representations; as such, the reader is encoureged to first look through the

encouraged

codegen::llir module to read about the new types being introduced; the changes to codegen::name_map and codegen::printer should then be fairly straightforward (but pay attention to vector_operand and vector_inst which have been moved from the InstPrinter). Finally, the changes to the backend printers should also be mostly straightforward, merely updating to the new API; the only real changes being the introduction of the C99Display and PTXDisplay traits used to display registers and operands.