Closed bsw4p closed 5 years ago
I cannot make sense of it. Can you reproduce with the latest version? I'm trying to reproduce it with IDA 7.1 right now (currently exporting a new big database) but... it looks weird to me. Can you please run the following small Python snippet and tell me if it works?
import sklearn
import numpy as np
from sklearn import tree
from sklearn import ensemble
from sklearn import neighbors
from sklearn import naive_bayes
from sklearn import linear_model
from sklearn import neural_network
from sklearn.externals import joblib
from sklearn.model_selection import cross_val_score
from sklearn.utils.validation import check_is_fitted
If it fails at importing some of them, then you will need to install that dependency.
UPDATE: Finished my testing with 7.1; I cannot reproduce.
So i upgraded to IDA 7.2 in the meantime but get still the same problem:
Python>import sklearn
import numpy as np
from sklearn import tree
from sklearn import ensemble
from sklearn import neighbors
from sklearn import naive_bayes
from sklearn import linear_model
from sklearn import neural_network
from sklearn.externals import joblib
from sklearn.model_selection import cross_val_score
from sklearn.utils.validation import check_is_fitted
Python>
so imports all work fine.
Please show me the MD5 hash of $PIGAIOS_DIR/ml/clf.pkl
.
PS: It should be b32647ff2333f99003865e87446df0a8.
OK, I think I know where the problem comes from: https://stackoverflow.com/questions/48948209/keyerror-when-loading-pickled-scikit-learn-model-using-joblib
So, we're using different joblib versions. In my case, I'm using 0.9.4:
$ dpkg -l python-joblib
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============================================-============================-============================-=================================================================================================
ii python-joblib 0.9.4-1 all tools to provide lightweight pipelining in Python
And found the problem: it's the joblib version. So, you will have to build your own clf.pkl
file. Open a terminal and do the following:
$ cd $PIGAIOS_DIR
$ cd ml
$ cp ../datasets/dataset.csv .
$ ./pigaios_ml.py -multi -t
[Tue Nov 27 12:00:04 2018] Using the Pigaios Multi Classifier
[Tue Nov 27 12:00:04 2018] Loading data...
[Tue Nov 27 12:00:05 2018] Fitting data with CPigaiosMultiClassifier(None)...
Fitting DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=None,
splitter='best')
Fitting BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
Fitting GradientBoostingClassifier(criterion='friedman_mse', init=None,
learning_rate=0.1, loss='deviance', max_depth=3,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
presort='auto', random_state=None, subsample=1.0, verbose=0,
warm_start=False)
Fitting RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
[Tue Nov 27 12:00:21 2018] Predicting...
[Tue Nov 27 12:01:29 2018] Correctly predicted 5441 out of 6989 (false negatives 1548 -> 22.149091%, false positives 215 -> 0.215000%)
[Tue Nov 27 12:01:29 2018] Total right matches 105226 -> 98.352167%
[Tue Nov 27 12:01:29 2018] Saving model...
And you will have built a model results file that you can load in your system.
Yes that did the trick. Thanks you for helping out!
Hi,
i am running IDA 7.1 on Ubuntu 18.04.1 and get the following error from unpickle when loading a sqlite databases through
sourceimp_ida.py
: