thesps / conifer

Fast inference of Boosted Decision Trees in FPGAs
Apache License 2.0
48 stars 27 forks source link

Random Forest - Only one branch generated in Vivado HLS #71

Open RolandJohnson76 opened 3 months ago

RolandJohnson76 commented 3 months ago

Hello Community,

I've been having a play with RandomForests with conifer, but for some reason even though I can see the tree being created and the tree details in parameters.h, my decision_function.vhd only includes one branch of the tree.

I'm using a custom dataset that I extracted from a program running in an FPGA with a RF of 1 tree with 3 layers - in reality, I would like around 100 trees with a depth of around 5, but I'm using few trees in this instance just to debug this issue.

Code is below:

Load dataset and assign names

col_names = ['Row 0', 'Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5', 'Row 6', 'Row 7', 'Row 8', 'Row 9', 'Row_Delta', 'Row_Mean', 'Row_Min', 'Row_Max', 'Row_Width', 'Col 0', 'Col 1', 'Col 2', 'Col 3', 'Col 4', 'Col 5', 'Col 6', 'Col 7', 'Col 8', 'Col 9', 'Col_Delta', 'Col_Mean', 'Col_Min', 'Col_Max', 'Col_Width', 'Label']

rs_data = pd.read_csv("/path/to/dataset/ML_Dataset_1_256_29072024.csv", header=None, names=col_names)

Assign features and target

feature_cols = ['Row 0', 'Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5', 'Row 6', 'Row 7', 'Row 8', 'Row 9', 'Row_Delta', 'Row_Mean', 'Row_Min', 'Row_Max', 'Row_Width', 'Col 0', 'Col 1', 'Col 2', 'Col 3', 'Col 4', 'Col 5', 'Col 6', 'Col 7', 'Col 8', 'Col 9', 'Col_Delta', 'Col_Mean', 'Col_Min', 'Col_Max', 'Col_Width']

X = rs_data[feature_cols] # Features y = rs_data.Label # Target variable

Split data

X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70% training and 30% test

Scale data

scaler = preprocessing.StandardScaler().fit(X_train_val) X_train_val = scaler.transform(X_train_val) X_test = scaler.transform(X_test)

Train with RandomForestClassifier

train = True if train: clf = RandomForestClassifier(n_estimators=1, max_depth=3, random_state=0) clf.fit(X_train_val, y_train_val) if not os.path.exists('ram_sniffer_rf'): os.makedirs('ram_sniffer_rf') joblib.dump(clf, 'ram_sniffer_rf/bdt.joblib') else: clf = joblib.load('ram_sniffer_rf/bdt.joblib')

Create and compile the model

cnf = conifer.converters.convert_from_sklearn(clf, cfg) cnf.compile()

Run HLS C Simulation and get the output

y_hls = cnf.decision_function(X_test) y_skl = clf.predict_proba(X_test)

Before running the final line, I go into "build_hls.tcl" and remove references of "-flow_target", as reported in issue #67

Synthesize the model

cnf.build(csim=False, cosim=False, export=True)

The build works and I get a message below saying "True"

In my parameters.h file, I have 7 weights, which is what I would expect to see for one tree with 3 layers, however in my decision_function.vhd file in my_prj/solution1/syn/vhdl I only have 3 weights, which makes it seem like there's only one branch being implemented.

Have you seen something like this before? What am I doing wrong?

Thanks in advance!

thundertwonk001 commented 3 months ago

Good morning, I found an alternative method for implementing forest algorithms in FPGAs, so please close this thread. I would be happy to share the method for those who are interested!