Closed zoj613 closed 4 years ago
@zoj613 Hello,
Yes, the current version is expecting the pool to be prefitted and that point is not very clear in the documentation. In fact, that is something that I want to change for the upcoming versions since we have already implemented routines to fit the pool of classifiers inside if the pool is None. So there is no reason for not accepting also an unfitted pool and doing everything inside the fit
of a DS method.
About requiring a different dataset for fittin the DS method, is a practice used by many works in the dynamic selection literature, especially when using strong classifiers in the pool (e.g., SVMs) which could overfit certain regions in the feature space.
However, having a completely separate partition is not always required when the pool is composed of weak classifiers or when we are dealing with very small datasets. From my experience in these cases, either using the same data (or having partially overlap with the training data) helps in improving results. This point is discussed in the library tutorial: https://deslib.readthedocs.io/en/latest/user_guide/tutorial.html
Looking at the repo it seems like the checks expect the pool to be prefitted if
pool_classifiers
is not None. Not only that, but it also requires that the passed in data tofit
be data not used in training the prefitted pool of heterogeneous classifiers? This doesn't seem to be emphasized in the documentation. Am I missing something? https://github.com/scikit-learn-contrib/DESlib/blob/a22defa871144b4e451364e0c2ba23db359d77f0/deslib/base.py#L207-L228