Previously (versions < 0.1.5), we only had one Feature Selection algorithm that is developed via XGBoost only for Classifications. When you zoom out, you can see that the base algorithm can be applied using GLMNet as well. Additionally, the change between classification and regression is just the metrics and a lot of shared codes can be found. This makes me to think about how we can pull off a better abstraction level that we can quickly add more functionalities down the road if we want and also stop writing the same codes. This also saves a lot of time for copy/pasting the same docstrings over and over.
Proposal
Let's define the abstract class as follows:
from abc import ABC, abstractmethod
from sklearn.base import BaseEstimator
class FeatureSelector(ABC, BaseEstimator):
def fit(self) -> None:
"""This is the main docstrings for fit."""
self._fit()
return None
def get_cv_results(self) -> int:
return self.cv_results_
@abstractmethod
def _fit(self) -> None:
...
Now, we can write a class for XGBoostFeatureSelector with its specific input parameters and validation layer:
from dataclasses import dataclass
from typing import Optional
@dataclass
class XGBoostFeatureSelector(FeatureSelector):
num_boost_rounds : Optional[int] = 100
def __post_init__(self) -> None:
if not isinstance(self.num_boost_rounds, int):
raise TypeError("Type is wrong!")
def _fit(self) -> None:
self._cv()
return None
def _cv(self) -> None:
self.cv_results_ = self.num_boost_rounds + 42
return None
We can also write a class for GLMNetFeatureSelector with its specific input parameters and validation layer:
from dataclasses import dataclass
from typing import Optional
@dataclass
class GLMNetFeatureSelector(FeatureSelector):
alpha : Optional[float] = 0.1
def __post_init__(self) -> None:
if not isinstance(self.alpha, float):
raise TypeError("Type is wrong!")
def _fit(self) -> None:
self.cv_results_ = self.alpha + 42
return None
Now, let's try them:
>>> c = XGBoostFeatureSelector()
>>> c.fit()
>>> c.get_cv_results()
142
>>> c = GLMNetFeatureSelector()
>>> c.fit()
>>> c.get_cv_results()
42.1
This pattern can be used for refactoring our HyperParameter-Tuning models as well.
Contact Details [Optional]
No response
Describe the feature you are interested in ...
Background
Previously (
versions < 0.1.5
), we only had oneFeature Selection
algorithm that is developed viaXGBoost
only forClassifications
. When you zoom out, you can see that the base algorithm can be applied usingGLMNet
as well. Additionally, the change betweenclassification
andregression
is just themetrics
and a lot of shared codes can be found. This makes me to think about how we can pull off a better abstraction level that we can quicklyadd
more functionalities down the road if we want and also stop writing the same codes. This also saves a lot of time for copy/pasting the same docstrings over and over.Proposal
Let's define the
abstract
class as follows:Now, we can write a class for
XGBoostFeatureSelector
with its specific input parameters and validation layer:We can also write a class for
GLMNetFeatureSelector
with its specific input parameters and validation layer:Now, let's try them:
This pattern can be used for refactoring our
HyperParameter-Tuning
models as well.Current example of
FeatureSelector
that can be refactored based on the above recipe https://github.com/slickml/slick-ml/blob/master/src/slickml/selection/_xgboost.pyIs your feature request related to a problem?
No response
Any other comments?
No response