thesps / conifer

Fast inference of Boosted Decision Trees in FPGAs
Apache License 2.0
48 stars 27 forks source link

Forest Processing Unit #38

Closed thesps closed 1 year ago

thesps commented 1 year ago

This PR adds a new implementation for FPGA BDT inference: the conifer Forest Processing Unit (FPU).

Summary from the docs:

The conifer Forest Processing Unit (FPU) is a flexible, fast architecture for BDT inference on FPGAs. The key difference of the FPU to the other conifer backends like HLS and VHDL, is that one FPU bitfile can perform inference for many BDTs, reconfigurable at runtime.

An FPU backend is added to build and interact with FPUs (and bitfiles are also made available on the conifer website). The FPU architecture is implemented with HLS, with Xilinx pynq for the runtime. It has currently been built with Xilinx tools (Vitis HLS, Vitis, Vivado) version 2022.2. Execution has been tested on pynq-z2 and Alveo U200 (with XRT version 2.14.354, platform xilinx_u200_gen3x16_xdma_2_202110_1).

On Alveo U200, measured inference time takes the form t = m * N + c for batch size N. c is around 100 μs, m is around 1 μs. This is corroborated with cosimulation, where the inference time in the FPU is around 1 μs, and the rest is from overheads and data movement.

This is the first PR that brings the core functionality, after which many developments are planned, including but not limited to: