New profile method - Githubissues

thesps / conifer

Fast inference of Boosted Decision Trees in FPGAs

Apache License 2.0

48 stars 27 forks source link

New profile method #75

Closed pviscone closed 3 months ago

pviscone commented 3 months ago

For each feature:

Orange line: median
Box plot: First-third quantile (25%,75%)
Black caps: 2.5% and 97.5% percentiles
Gray caps: Full range

thr gain

pviscone commented 3 months ago

New style:

gain thr

thesps commented 3 months ago

Thanks for this, these plots look like a very useful improvement. I haven't reviewed the code yet, but I'm noticing in the Gains plot that the Leaves stats are cropped?

pviscone commented 3 months ago

Thanks for this, these plots look like a very useful improvement. I haven't reviewed the code yet, but I'm noticing in the Gains plot that the Leaves stats are cropped?

Sorry, I was just not handling zero values. Now it is fixed gain

pviscone commented 3 months ago

Just a comment: right now I added a dictionary like {feature_name:feature_id} to the ensebleDict created by the xgboost converter. This should be added to all the converters.

I have not done it yet but as a workaround, if this dictionary is not present in the ensembleDict, in the plot you would see just feat_0, feat_1 ,... instead of the actual feature names.

I don't know if all the other libraries save the feature names like xgboost does. In this case, a solution could be allowing passing the feature names as an argument of the profile method.

thesps commented 3 months ago

Nice, the new plot looks great 👍

I think the addition of the feature_map is good. I would suggest integrating it a bit more deeply, by that I mean:

move it to ensemble_fields of ModelBase so that it's always expected
for all converters other than xgboost, set it to a dictionary like {str(i) : i for i in range(n_features)}

Right now it's only used in the profiling obviously, but if it's available I could imagine using it in more places in the backends, so having it always set to something would potentially help avoid lots of if model.feature_map is None later.

Then we/I can set it correctly for the other converters that support in followup PRs later.

pviscone commented 3 months ago

I have applied all the suggestions.

This is what the "both" option looks like. Also i made "both" the default argument

both

thesps commented 3 months ago

Thanks this looks good now, merging!