The codegen for TL2 was a bit difficult to reason about since the C++ code was directly embedded as a Python string. With this PR, I've added a simple Jinja2 template which contains the whole file, making it much easier to follow as well as make changes.
Other notable changes:
added jinja2 to requirements
don't use defines but constexpr auto instead
some of the pragma unroll loops have been made portable (notably to GCC) using the UNROLL_LOOP macro (see also #83)
The generated bitnet-lut-kernels.h is more or less identical (with some slight whitespace differences).
Note that I didn't do the same for TL1 codegen since I'd first like to get feedback on whether this is a step in the right direction.
The codegen for TL2 was a bit difficult to reason about since the C++ code was directly embedded as a Python string. With this PR, I've added a simple Jinja2 template which contains the whole file, making it much easier to follow as well as make changes.
Other notable changes:
jinja2
to requirementsdefine
s butconstexpr auto
insteadpragma unroll
loops have been made portable (notably to GCC) using theUNROLL_LOOP
macro (see also #83)The generated
bitnet-lut-kernels.h
is more or less identical (with some slight whitespace differences).Note that I didn't do the same for TL1 codegen since I'd first like to get feedback on whether this is a step in the right direction.