Closed sara-eb closed 4 years ago
@sara-eb Hello,
Sorry for the delay response. The library accepts any list of classifiers as the pool of classifiers so it does accept a combination of ensemble methods with single classifier models. There are two ways of doing that:
`X ,y = make_classification() rf = RandomForestClassifier(n_estimators=10).fit(X, y) adaboost = AdaBoostClassifier(n_estimators=10).fit(X, y) svm = SVC().fit(X, y) tree = DecisionTreeClassifier().fit(X, y)
pool1 = [rf, adaboost , svm, tree] pool2 = rf.estimators + adaboost.estimators + [svm, tree]`
In the case, pool1 is a pool of classifiers composed of 4 estimators (although random forest and adaboost are composed of multiple base estimators, the DS method looks at them as a being a single one). pool2 treats each member of random forest/adaboost as a single, independent model instead of their combination. So, the DS model sees it as a pool composed of 22 models (10 coming from rf, 10 from adaboost, 1 svm and 1 decision tree).
You may want to check our heterogeneous example too in which we use classifiers of different types in the pool: https://deslib.readthedocs.io/en/latest/auto_examples/example_heterogeneous.html#sphx-glr-auto-examples-example-heterogeneous-py
@Menelau Thank you very much sir, Your explanation is very clear.
Thanks again
@Menelau I created a pool of classifiers for my data including a random forest with 200 estimators and an AdaBoost classifier with 600 decision trees, and I am using faiss technique as knn_type
.
pool_classifiers = [model_ada, model_rf]
knorae = KNORAE(pool_classifiers=pool_classifiers,
knn_classifier=knn_type)
print("Fitting KNORAE on X_DSEL dataset")
knorae.fit(X_DSEL, y_DSEL)
print("Saving the dynamic selection model in ", ds_model_outdir)
outfile = ds_model_outdir+'KNORAE_rfE200_adaDT600.joblib'
print(outfile)
dump(knorae, outfile)
Since my validation (i.e., DSEL) dataset is quite big number of samples, I was trying to fit the DS model on validation data and save the model for later prediction on test dataset. However, I am facing an issue of saving it:
TypeError: can't pickle module objects
What could be the reason?
@sara-eb Hello,
I have a feeling that it happens because of the information stored in the faiss knn but I'm not sure. I will investigate that and get back to you asap.
You can try using dill instead of pickle for saving the model: https://pypi.org/project/dill/ I believe that should work for you.
@Menelau
Thanks for recommendation,
I installed dill and tried to save with dill
:
pickle_filename = ds_model_outdir+'KNORAE_rfE200_adaDT600.pkl'
pickle.dump(knorae, open(pickle_filename,'wb'))
However, still getting error;
TypeError: can't pickle SwigPyObject objects
I have traind RandomForest classifier in parallel, can this be the reason?
@sara-eb ,
Parallel random forest shouldn’t be a problem at all. I dig deeper into this issue and I found a problem with the serialization of the Faiss KNN. In the case, the index computed by the faiss knn needs to be converted to a string before it is written to a file (see https://github.com/facebookresearch/faiss/issues/914).
So I prepared a workaround with functions for saving and loading DS models that should solve this problem (save_ds, load_ds). In the case, they just check whether faiss is being used for the knn calculation in the DS models and if yes, do the conversions before saving/loading. I added the code in this gist: https://gist.github.com/Menelau/0cde51c3622be6313fd96b4dffb17996 Can you check if using this workaround solves your problem?
Now I will see how to add to DESlib a saving/loading functionality for the DS methods (that can handle Faiss knn automatically) as soon as possible.
@Menelau Thank you very much sir, It works perfectly Appreciate it
@Menelau I am facing new issue with scoring now on the test set. What could be the reason.
score = knorae.score(X_test, y_test)
File "/home/esara/deslib-env/lib/python3.6/site-packages/sklearn/base.py", line 357, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "/home/esara/deslib-env/lib/python3.6/site-packages/deslib/base.py", line 440, in predict
distances, neighbors = self._get_region_competence(X_DS)
File "/home/esara/deslib-env/lib/python3.6/site-packages/deslib/base.py", line 381, in _get_region_competence
return_distance=True)
File "/home/esara/deslib-env/lib/python3.6/site-packages/deslib/util/faiss_knn_wrapper.py", line 112, in kneighbors
dist, idx = self.index_.search(X, n_neighbors)
AttributeError: 'numpy.ndarray' object has no attribute 'search'
Hello,
How did you load the ds model? Did you use the load_ds function I provided in the gist: https://gist.github.com/Menelau/0cde51c3622be6313fd96b4dffb17996 ?
I believe the error is in the way you are loading the DS model. In order to save the Faiss model, it's index is converted to a numpy array, so that it can be pickled. In the case, the self.index_ variable is the one containing the indexes, so it is serialized in the save_ds function (by converting to numpy array). Then, in order to load it back the conversion to numpy array back to Faiss index needs to be done (which the load_ds function in the Gist performs).
@Menelau Thanks a lot sir, sorry I did not realize that I need to reload the model since the model is already in the variable list in the memory. Thank you very much for mentioning the point.
As you have mentioned in your examples, BaggingClassifier or RandomeForest classifier are considered as a pool of classifier itself.
I am wondering is it possible if I create a pool of classifiers including traditional ensemble methods like RF, Adaboost in combination of single classifiers like SVM, kNN?
Thanks