Pickle error on loading

bsw4p commented 5 years ago

Hi,

i am running IDA 7.1 on Ubuntu 18.04.1 and get the following error from unpickle when loading a sqlite databases through sourceimp_ida.py:

 File "/home/bs/ida-7.1/python/ida_idaapi.py", line 566, in IDAPython_ExecScript
    execfile(script, g)
  File "/home/bs/pigaios/sourceimp_ida.py", line 567, in <module>
    main()
  File "/home/bs/pigaios/sourceimp_ida.py", line 553, in main
    importer.import_src(database)
  File "/home/bs/pigaios/sourceimp_ida.py", line 483, in import_src
    if self.find_initial_rows():
  File "/home/bs/pigaios/sourceimp_core.py", line 491, in find_initial_rows
    score, reasons, ml, qr = self.compare_functions(match_id, bin_id, SAME_RARE_CONSTANT)
  File "/home/bs/pigaios/sourceimp_core.py", line 226, in compare_functions
    self.ml_model = self.ml_classifier.load_model()
  File "/home/bs/pigaios/ml/pigaios_ml.py", line 208, in load_model
    return joblib.load(filename)
  File "/usr/lib/python2.7/dist-packages/joblib/numpy_pickle.py", line 578, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/usr/lib/python2.7/dist-packages/joblib/numpy_pickle.py", line 508, in _unpickle
    obj = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
KeyError: '\x02'

joxeankoret commented 5 years ago

I cannot make sense of it. Can you reproduce with the latest version? I'm trying to reproduce it with IDA 7.1 right now (currently exporting a new big database) but... it looks weird to me. Can you please run the following small Python snippet and tell me if it works?


import sklearn
import numpy as np
from sklearn import tree
from sklearn import ensemble
from sklearn import neighbors
from sklearn import naive_bayes
from sklearn import linear_model
from sklearn import neural_network
from sklearn.externals import joblib
from sklearn.model_selection import cross_val_score
from sklearn.utils.validation import check_is_fitted

If it fails at importing some of them, then you will need to install that dependency.

UPDATE: Finished my testing with 7.1; I cannot reproduce.

bsw4p commented 5 years ago

So i upgraded to IDA 7.2 in the meantime but get still the same problem:

Python>import sklearn
import numpy as np
from sklearn import tree
from sklearn import ensemble
from sklearn import neighbors
from sklearn import naive_bayes
from sklearn import linear_model
from sklearn import neural_network
from sklearn.externals import joblib
from sklearn.model_selection import cross_val_score
from sklearn.utils.validation import check_is_fitted
Python>

so imports all work fine.

joxeankoret commented 5 years ago

Please show me the MD5 hash of $PIGAIOS_DIR/ml/clf.pkl.

PS: It should be b32647ff2333f99003865e87446df0a8.

joxeankoret commented 5 years ago

OK, I think I know where the problem comes from: https://stackoverflow.com/questions/48948209/keyerror-when-loading-pickled-scikit-learn-model-using-joblib

So, we're using different joblib versions. In my case, I'm using 0.9.4:

$ dpkg -l python-joblib
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                           Version                      Architecture                 Description
+++-==============================================-============================-============================-=================================================================================================
ii  python-joblib                                  0.9.4-1                      all                          tools to provide lightweight pipelining in Python

joxeankoret commented 5 years ago

And found the problem: it's the joblib version. So, you will have to build your own clf.pkl file. Open a terminal and do the following:

$ cd $PIGAIOS_DIR
$ cd ml
$ cp ../datasets/dataset.csv .
$ ./pigaios_ml.py -multi -t
[Tue Nov 27 12:00:04 2018] Using the Pigaios Multi Classifier
[Tue Nov 27 12:00:04 2018] Loading data...
[Tue Nov 27 12:00:05 2018] Fitting data with CPigaiosMultiClassifier(None)...
Fitting DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')
Fitting BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
Fitting GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=3,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=100,
              presort='auto', random_state=None, subsample=1.0, verbose=0,
              warm_start=False)
Fitting RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
[Tue Nov 27 12:00:21 2018] Predicting...
[Tue Nov 27 12:01:29 2018] Correctly predicted 5441 out of 6989 (false negatives 1548 -> 22.149091%, false positives 215 -> 0.215000%)
[Tue Nov 27 12:01:29 2018] Total right matches 105226 -> 98.352167%
[Tue Nov 27 12:01:29 2018] Saving model...

And you will have built a model results file that you can load in your system.

bsw4p commented 5 years ago

Yes that did the trick. Thanks you for helping out!

joxeankoret / pigaios

Pickle error on loading #19