Closed thesps closed 1 year ago
Is there a particular version of XGBoost/Scikit-Learn used here?
Trying out the example pruned_xgboost_to_hls.py:
With XGBoost 2.0.0 I get AttributeError: 'list' object has no attribute 'get'
With XGBoost 1.7.6 this changes to in addParentAndDepth parents[j] = i IndexError: list assignment index out of range
In either case I have not been able to get pruned trees to compile.
This is likely an issue with the XGBoost converter rather than the pruned model support. I can confirm though that this was tested with XGBoost 1.7.5. I just tried with 2.0.0 and I see the same issue as you with the XGBoost converter tests. I'm not sure about the issue you see with 1.7.6, we can look into that. It might be a corner case only in that model.
I'll open a new issue about updating the XGBoost converter again for 2.0.0
Thanks @thesps! In the mean time can you confirm that the XGBoost converter works for pruned trees with 1.7.5? I still get parents[j] = i IndexError: list assignment index out of range
with pruned_xgboost_to_hls.py
For completeness I use scikit-learn==1.3.0, conifer==1.2
Thanks for the info, I checked that example as well and I replicated your error. I haven't worked out the fix yet but I can give some insight into the issue: Since the error comes from the XilinxHLS backend, I converted the model with no configuration (Python backend) which is available in master branch but not yet in a release (will be in v1.3). Then I check if there are some trees with child indices out of range of the number of nodes:
model = conifer.converters.convert_from_xgboost(bst)
np.any([(np.any(np.array(tree.children_left) > tree.n_nodes()), np.any(np.array(tree.children_right) > tree.n_nodes())) for trees in model.trees for tree in trees])
True
So somehow the xgboost converter has read those out of range values from the model. This is where those child indices are set: https://github.com/thesps/conifer/blob/master/conifer/converters/xgboost.py#L51-L52
I will keep investigating!
This PR is inspired by #30.
In the main branch, the loop over ensemble trees in the HLS is like this (simplified):
Vivado/Vitis HLS seems to have a hard time unrolling this loop, for an unknown reason. In this PR we manually unroll this loop in the backend writer instead. So for a model with 5 trees this looks like (simplified):
Now we also need to create instances of each tree (
tree_0
,tree_1
etc) instead of an array of them inparameters.h
, and create one model specific function with the unrolled inference code.The impact is:
For large models (e.g. in the scan referenced below with 100 trees and depth of 8) master branch may not be able to synthesize the model while this PR can. The old implementation with a
for
loop in the C++ is still available, with a new configuration parameter'Unroll'
made available to select between the implementations (defaults toTrue
).For reference, here are some plots from a scan of randomized trees (blue markers are
master
branch, red markers are this PR):