Open LambertAn opened 6 years ago
Hello @LambertAn, thanks for your detailed report. Can you provide some data to reproduce the behaviour? And did you run the integrity check with integrity_score? What score did you get?
Thanks for getting back to me.
Below is code to build a 3-class extra tree classifier on random data.
from sklearn_porter import Porter
from sklearn.ensemble import ExtraTreesClassifier
import numpy as np
# Build random dataset
prng = np.random.RandomState(123)
X = prng.rand(50, 10)
y = prng.randint(0, 3, 50)
# Fit model
model = ExtraTreesClassifier(n_estimators=3, max_depth=3, random_state=prng)
model.fit(X, y)
# export:
porter = Porter(model, language='c')
output = porter.export(embed_data=True)
with open('extratree_randomdataset_original.c', 'w') as f_out:
f_out.write(output)
# accuracy:
integrity = porter.integrity_score(X)
print(integrity)
# Show details for one point
test_point = X[2:3]
for i in range(0, len(model.estimators_)):
print ("{}: {} -> {}".format(i, model.estimators_[i].predict_proba(test_point), model.estimators_[i].predict(test_point)))
print (model.predict_proba(test_point))
print (model.predict(test_point))
print (test_point)
The integrity score on the training data is 0.86. Let's look at the result for one of the data point: each estimator predicts a different class:
Estimator 0 predicts class 0 with probabilities [0.45 0.20 0.35] Estimator 1 predicts class 2 with probabilities [0.17 0.08 0.75] Estimator 2 predicts class 1 with probabilities [0.24 0.52 0.24]
The model predicts class 2 with probabilities [0.29 0.27 0.45].
I attached the above python code and 2 C files (the original model as generated by sklearn-porter and a modified version that calculates the probabilities for each estimator as well as the average for the model prediction):
For the above point the original 'predict' method returns class 0 and the new model 'predict_proba method returns: [0.29 0.27 0.45].
I hope it is enough to reproduce the problem.
Hello @LambertAn, we found a small bug and fixed it (release/0.7.0: Merge branch 'master' into release/0.7.0). Can you please reinstall the package and test it again?
pip uninstall -y sklearn-porter
pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master
Hi, I finally had some time to test but unfortunately this problem was not fixed. I used the python script above and had exactly the same results as before with an integrity score of 0.86.
I belive this is the same issue as https://github.com/nok/sklearn-porter/issues/52
I was trying to implement the predict_proba function for an Extra Tree model when I realized that the result returned by the transpiled version of the model differed from the one returned by sklearn.
My model contains 30 trees and 3 classes, below are the classes predicted by sklearn along side the probabilities for each estimator:
17 estimators predict class 0 and 13 predict class 2 BUT the model predicts class 2 because it is the most probable class.
Therefore it seems to me that the transpiled model should also make its decision on the predicted probabilities.
What do you think?