Background

Previously (versions < 0.1.5), we only had one Feature Selection algorithm that is developed via XGBoost only for Classifications. When you zoom out, you can see that the base algorithm can be applied using GLMNet as well. Additionally, the change between classification and regression is just the metrics and a lot of shared codes can be found. This makes me to think about how we can pull off a better abstraction level that we can quickly add more functionalities down the road if we want and also stop writing the same codes. This also saves a lot of time for copy/pasting the same docstrings over and over.

Proposal

Let's define the abstract class as follows:

from abc import ABC, abstractmethod
from sklearn.base import BaseEstimator

class FeatureSelector(ABC, BaseEstimator):

    def fit(self) -> None:
        """This is the main docstrings for fit."""
        self._fit()        
        return None

    def get_cv_results(self) -> int:
        return self.cv_results_

    @abstractmethod
    def _fit(self) -> None:
        ...

Now, we can write a class for XGBoostFeatureSelector with its specific input parameters and validation layer:

from dataclasses import dataclass
from typing import Optional

@dataclass
class XGBoostFeatureSelector(FeatureSelector):
    num_boost_rounds : Optional[int] = 100

    def __post_init__(self) -> None:
        if not isinstance(self.num_boost_rounds, int):
            raise TypeError("Type is wrong!")

    def _fit(self) -> None:
        self._cv()
        return None

    def _cv(self) -> None:
        self.cv_results_ = self.num_boost_rounds + 42
        return None

We can also write a class for GLMNetFeatureSelector with its specific input parameters and validation layer:


from dataclasses import dataclass
from typing import Optional

@dataclass
class GLMNetFeatureSelector(FeatureSelector):
    alpha : Optional[float] = 0.1

    def __post_init__(self) -> None:
        if not isinstance(self.alpha, float):
            raise TypeError("Type is wrong!")

    def _fit(self) -> None:
        self.cv_results_ = self.alpha + 42
        return None

Now, let's try them:

>>> c = XGBoostFeatureSelector()
>>> c.fit()
>>> c.get_cv_results()
142

>>> c = GLMNetFeatureSelector()
>>> c.fit()
>>> c.get_cv_results()
42.1

This pattern can be used for refactoring our HyperParameter-Tuning models as well.

Current example of FeatureSelector that can be refactored based on the above recipe https://github.com/slickml/slick-ml/blob/master/src/slickml/selection/_xgboost.py

Is your feature request related to a problem?

No response

Any other comments?

No response

slickml / slick-ml

[FEATURE]: Improve Abstraction around Selection and Optimization #149

Contact Details [Optional]

Describe the feature you are interested in ...

Background

Proposal

Is your feature request related to a problem?

Any other comments?