jjbrophy47 / tree_influence

Influence Estimation for Gradient-Boosted Decision Trees
Apache License 2.0
26 stars 10 forks source link

Error when fitting the estimator #6

Closed aclarkse closed 5 months ago

aclarkse commented 7 months ago

Hello,

I was using your implementation of BoostIn to fit my own data, but I came across an error, so I thought it might be due to some inherent inconsistency with my features. However, when fitting it to the iris data provided by the sklearn package (as cited in your example document in the repository), I came across this very same error:

180 # compute leaf derivative w.r.t. each train example in leaf_docs 181 numerator = g[leaf_docs, class_idx] + leaf_vals[leaf_idx] h[leaf_docs, class_idx] # (no. docs,) --> 182 denominator = np.sum(h[leaf_docs, class_idx]) + l2_leaf_reg 183 leaf_dvs[leaf_docs, boost_idx, class_idx] = numerator / denominator lr # (no. docs,) 185 # update approximation

TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

Could you please give me some guidance as to what can be going wrong? For context, I am using an XGBoost model here, and I must provide scale_pos_weight=1 in order to avoid having an assertion error. It would be nice if this could be modified as well. Thank you!

jjbrophy47 commented 7 months ago

Hi @aclarkse! Can you please provide:

Thanks!

aclarkse commented 7 months ago

Hi there! Of course:

The error occurs when fitting the estimator. The model is defined as follows: model = XGBClassifier(scale_pos_weight=1).fit(X_train, y_train)

And after running this line: explainer = BoostIn().fit(model, X_train, y_train)

I get the following error message:


TypeError Traceback (most recent call last) Cell In[8], line 1 ----> 1 explainer = BoostIn().fit(model, X_train, y_train)

File c:\Users\andre\anaconda3\envs\tree_influence\lib\site-packages\tree_influence\explainers\boostin.py:53, in BoostIn.fit(self, model, X, y) 50 self.ntrain = X.shape[0] 51 self.lossfn = util.get_lossfn(self.model.objective, self.model_.nclass, self.model_.factor) ---> 53 self.train_leafdvs = self._compute_leaf_derivatives(X, y) # (X.shape[0], n_boost, n_class) 54 self.train_leafidxs = self.model_.apply(X) # shape=(X.shape[0], no. boost, no. class) 56 return self

File c:\Users\andre\anaconda3\envs\tree_influence\lib\site-packages\tree_influence\explainers\boostin.py:182, in BoostIn._compute_leaf_derivatives(self, X, y) 180 # compute leaf derivative w.r.t. each train example in leaf_docs 181 numerator = g[leaf_docs, class_idx] + leaf_vals[leaf_idx] h[leaf_docs, class_idx] # (no. docs,) --> 182 denominator = np.sum(h[leaf_docs, class_idx]) + l2_leaf_reg 183 leaf_dvs[leaf_docs, boost_idx, class_idx] = numerator / denominator lr # (no. docs,) 185 # update approximation

TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

jjbrophy47 commented 5 months ago

I think this issue should be resolved in v0.1.7. Please give that a try, thank you!