thesps / conifer

Fast inference of Boosted Decision Trees in FPGAs
Apache License 2.0
48 stars 27 forks source link

VHDL Backend : Missing assignement in AddReduce.vhd ? #54

Open SebastianHambura opened 1 year ago

SebastianHambura commented 1 year ago

Hi,

I've started working with this tool. I tried to convert a simple xgboost tree into VHDL. Conifer is creating the files without a problem, however when I try to import them in Vivado it's a mess. After struggling a bit with the library names (it's my first time using VHDL, usually I'm working with Verilog), I decide to go another route and convert the VHDL code into a Verilog module using GHDL (and/or Yosys).

In case someone is curious, that's how I'm currently doing it :

``` ghdl -a Constants.vhd ghdl -a Types.vhd ghdl -a AddReduce.vhd ghdl -a Arrays0.vhd ghdl -a Tree.vhd ghdl -a BDT.vhd ghdl -a BDTTop.vhd ghdl --synth --out=verilog bdttop > BDTTop.v ```

It's seems to be working, and while my VHDL attemps crashed Vivado, the Verilog file plays very nicely.

However I get following warnings :

AddReduce.vhd:28:11:warning: declaration of "addreduce" hides entity "addreduce" [-Whide]
component AddReduce is
          ^
AddReduce.vhd:59:10:warning: no assignment for offsets 0:17 of signal "dint"
  signal dInt : tyArray(0 to intLen - 1) := (others => (others => '0'));
         ^
AddReduce.vhd:59:10:warning: no assignment for offsets 0:17 of signal "dint"
  signal dInt : tyArray(0 to intLen - 1) := (others => (others => '0'));

And indeed, when I try to run a simulation with Vivado I find that the output signal from the tree is 'XXXXXXXX', and I have some internal ZZZs .

Is this an issue from your side, or did something go wrong during the translation process ?

(I'm using Vivado 2020.1)

thesps commented 1 year ago

Hi, I have a few comments/queries, hopefully we can get this working.

Have you been able to run one of the examples without issue? e.g. sklearn_to_hdl.py would be a good start for the HDL backend.

As well as the VHDL files produced, there will be some scripts for simulation and synthesis. Those can be a good starting point to jump from the conifer python to a Vivado project. There is support for GHDL simulation in conifer, you can access that by setting conifer.backends.vhdl.simulator = conifer.backends.vhdl.GHDL before starting conversion of the model. When you write/compile/build the project after that, a script ghdl_compile.sh will be written out. It's not so different from the one you posted, but there are some more flags for the libraries and VHDL 2008 setting.

I tried doing something like you described, adding ghdl --synth --work=BDT --std=08 --out=verilog BDT.BDTTop > BDTTop.v to the ghdl_compile.sh script, then fiddling the VHDL testbench BDTTestbench.vhd to instantiate the Verilog module, and run the whole thing from Vivado 2020.1. I also see only Xs for the output. I'll keep digging into it, but the unmodified workflow does work for running synthesis and simulation.

I do also see this AddReduce.vhd:28:11:warning: declaration of "addreduce" hides entity "addreduce" [-Whide] warning from GHDL. I will look into whether that's the issue.

Is there any reason to choose one backend over another for your project? It might be worth trying the HLS backend and exporting the C Synthesized model as Verilog if that's how you'll integrate it later.

thesps commented 1 year ago

Also, AddReduce.vhd:28:11:warning: declaration of "addreduce" hides entity "addreduce" [-Whide] this warning occurs because of recursive instantiation of the component.

SebastianHambura commented 1 year ago

sklearn_to_hdl.py is working without issue 👍 I can also import the VHDL files into Vivado and run the simulation (I just need to manually change the library from the files to BDT). I'm still a bit confused why it's both complaining about a missing file and running the simulation fine, but whatever. grafik


I haven't used HLS yet and from your paper the RTL backend seemed to perform better, so I would have preferred to use the VHDL backend.

I started going this route because model.build() was taking very long and then failing, so I trying to manually import the VHDL files into Vivado, which also failed (probably because I'm not used to working with VHDL). I found a potential culprit though : for my xgboost model (10 trees, up to depth 7) the generated Arrays0.vhd is 756.6 kB, while the one generated from the example is only 13kB. It could be that I'm hitting some memory issues with Vivado when trying to synthesize this model. At least once model.decision_function failed and I got this in ghdl.log :

/snap/ghdl/1/bin/ghdl:error: declaration of a too large object (359 > --max-stack-alloc=128 KB)
in process .testbench(rtl).input@simulationinput(rtl).P0
  from: bdt.types.to_tyarray2d
  from: bdt.arrays0.ELAB_SPEC
  from: bdt.arrays0.ELAB_BDY
  from: bdt.bdttop.PKG_ELAB
  from: xil_defaultlib.testbench(rtl).uut.CMP_ELAB
  from: xil_defaultlib.testbench(rtl).DECL_ELAB
  from: Elaboration of design
/snap/ghdl/1/bin/ghdl:error: error during elaboration

I've tried adding--work=BDT --std=08 to my commands, but I still get these warnings :

AddReduce.vhd:59:10:warning: no assignment for offsets 0:17 of signal "dint"
  signal dInt : tyArray(0 to intLen - 1) := (others => (others => '0'));

Do you also have them, or is it an issue from my side ?

SebastianHambura commented 1 year ago

From the example : DEBUG:conifer.model:Converted BDT with parameters {"max_depth": 3, "n_trees": 20, "n_features": 10, "n_classes": 2} From my code : DEBUG:conifer.model:Converted BDT with parameters {"max_depth": 10, "n_trees": 10, "n_features": 30, "n_classes": 2}

Are you always producing fully-fledged tree all the time ? This is one of the tree from my xgboost model. grafik If you're also generating the nodes which are not used/not existing, I can see how that could be a memory/performance issue further down the synthesis pipeline

thesps commented 1 year ago

Regarding depth and sparsity, since v1.0 the HLS backend may perform better for sparse trees (this is also much more recent than the paper). Indeed as you suggest, the VHDL backend currently pads sparse trees to make them full, which can be quite costly for synthesis time. It's on the to-do list to give the VHDL the same treatment to avoid padding the trees, but it's not yet implemented*. The padded trees shouldn't contribute much to extra resource usage, but definitely hurts synthesis (and simulation evidently).

I left some information and performance plots on the PR #41 that introduced this HLS improvement for sparse trees. The change in resource usage and latency is small, but the improvement in synthesis time is significant.

I've been evaluating the HLS backend with recent Vitis HLS versions (2023.1), which also tends to synthesize faster than older versions of the software.

I started going this route because model.build() was taking very long and then failing

What is the actual failure? Is there any information in the Vivado log file?

Do you also have them, or is it an issue from my side ?

I saw this warning when running the Verilog export/synthesis, but not for the out-of-the-box VHDL simulation

SebastianHambura commented 1 year ago

I've tried with varying max-depth parameters in xgboost, and to me it looks like my computer doesn't have enough memory to build the model.

VHDL backend :

Max depth Run time Vivado Peak memory usage Result
7 1:16 2936 MB
8 4:48 3166 MB
9 14:05 5624 MB
10 27:43 14159 MB

And when it fails, there is no specific error message :

ERROR:conifer.backends.vhdl.writer:build failed, check build.log

/media/sebastian/Software/Xilinx/Vivado/2020.1/bin/loader: line 286: 25252 Killed                  "$RDI_PROG" "$@"
End of build.log file

``` --------------------------------------------------------------------------------- Finished Synthesize : Time (s): cpu = 00:26:23 ; elapsed = 00:26:36 . Memory (MB): peak = 14159.375 ; gain = 12008.105 ; free physical = 1535 ; free virtual = 4312 --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- Finished Constraint Validation : Time (s): cpu = 00:26:56 ; elapsed = 00:27:12 . Memory (MB): peak = 14159.375 ; gain = 12008.105 ; free physical = 1958 ; free virtual = 4635 --------------------------------------------------------------------------------- --------------------------------------------------------------------------------- Start Loading Part and Timing Information --------------------------------------------------------------------------------- Loading part: xczu3eg-sfvc784-1-e ```

End of ghdl.log

``` /snap/ghdl/1/bin/ghdl:error: declaration of a too large object (359 > --max-stack-alloc=128 KB) in process .testbench(rtl).input@simulationinput(rtl).P0 from: bdt.types.to_tyarray2d from: bdt.arrays0.ELAB_SPEC from: bdt.arrays0.ELAB_BDY from: bdt.bdttop.PKG_ELAB from: xil_defaultlib.testbench(rtl).uut.CMP_ELAB from: xil_defaultlib.testbench(rtl).DECL_ELAB from: Elaboration of design /snap/ghdl/1/bin/ghdl:error: error during elaboration ```

So I think Vivado crashes because it's out of memory to use.

HLS Backend

Max depth Run time HLS Peak memory usage Result
8 04:33 3249 MB
9 17:51 9809 MB
10 1:19:26 6393 MB and then fail
End of build.log

``` INFO: [HLS 200-489] Unrolling loop 'Activate' (firmware/BDT.h:65) in function 'BDT::Tree<0, 2047, 2038, ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0> [30], ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0> >::decision_function' completely with a factor of 2047. INFO: [HLS 200-489] Unrolling loop 'Loop-3' (firmware/BDT.h:87) in function 'BDT::Tree<0, 2047, 2038, ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0> [30], ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0> >::decision_function' completely with a factor of 2038. INFO: [XFORM 203-101] Partitioning array 'x.V' (firmware/my_prj.cpp:4) in dimension 1 completely. INFO: [XFORM 203-101] Partitioning array 'score.V' (firmware/my_prj.cpp:4) in dimension 1 completely. INFO: [XFORM 203-101] Partitioning array 'scores' in dimension 1 completely. ERROR: [XFORM 203-103] Array 'tree_9_0.feature' : partitioned elements number (2047) has exeeded the threshold (1024), which may cause long run-time. ERROR: [HLS 200-70] Pre-synthesis failed. command 'ap_source' returned error code while executing "source build_prj.tcl" ("uplevel" body line 1) invoked from within "uplevel \#0 [list source $arg] " INFO: [Common 17-206] Exiting vivado_hls at Thu Aug 10 11:50:54 2023... ```

I think that for our purpose, a max depth of 8 should still be able to work. I need to check with my team. And maybe I can just increase the memory of my machine to see if that allow to make bigger trees.

Concerning the manual unrolling of trees, would it not be better to instead do recursive module instantiation ? Then we could create sparse trees in HDL code.

thesps commented 1 year ago

Thanks for studying this in such detail! Killed "$RDI_PROG" "$@" is definitely the out-of-memory error.