lemma-osu / sknnr

scikit-learn compatible estimators for various kNN imputation methods
https://sknnr.readthedocs.io
0 stars 1 forks source link

Simplify fit parameters for `CCATransformer` and `CCorATransformer`? #30

Closed aazuspan closed 1 year ago

aazuspan commented 1 year ago

While working on #20 I noticed that CCATransformer and CCorATransformer both specify y as an optional parameter but won't actually work if it is None (unless the optional spp is provided instead). To reproduce:

from sklearn.datasets import load_linnerud
from sknnr.transformers import CCATransformer, CCorATransformer

X, y = load_linnerud(return_X_y=True)
# AxisError: axis 1 is out of bounds for array of dimension 0
CCATransformer().fit(X)

My first thought was that we should throw a ValueError if both y and spp are None, but now that I think about it I'm wondering if those transformers should instead just take a required y parameter and let the estimator handle the selection of fitting data. So for example, GNNRegressor.fit would pass either y or spp, but not both, to CCATransformer.fit.

Currently, CCATransformer really only takes an spp parameter because the estimator that uses it might pass that, which is probably a bit backwards and could lead to some confusion, as well as some unnecessary complexity. Making the estimators responsible for choosing the right fitting data would remove a couple parameters and the need for special error handling since both estimators already require y parameters.

Let me know what you think, @grovduck.

grovduck commented 1 year ago

@aazuspan, great catch! Your proposed solution sounds perfect. I assume that the estimator (e.g. GNNRegressor) would still throw the ValueError if y and spp are both None?

aazuspan commented 1 year ago

Actually, no need to manually throw an error because y is already required when fitting the estimators.

Here's the implementation I'm thinking of for GNNRegressor, but I'm just using y_fit as a placeholder name for the y we use to build the CCA space. Any ideas for a better alternative? Maybe it should just be called spp, or the spp parameter should be called y_fit or something similar? We had talked about renaming that at one point, and maybe now is a good time?

class GNNRegressor(IDNeighborsRegressor, TransformedKNeighborsMixin):
    def fit(self, X, y, spp=None):
        y_fit = spp if spp is not None else y
        self.transform_ = CCATransformer().fit(X, y=y_fit)
        return super().fit(X, y)
grovduck commented 1 year ago

Actually, no need to manually throw an error because y is already required when fitting the estimators.

Yes, of course. Sorry for being dense. I suppose a user could call like:

GNNRegressor().fit(X, None)

but that is true of any estimator and we can't guard against that.

Maybe it should just be called spp, or the spp parameter should be called y_fit or something similar? We had talked about renaming that at one point, and maybe now is a good time?

I think this is the perfect time to change spp in GNNRegressor.fit and MSNRegressor.fit to y_fit. The implementation looks great to me.

aazuspan commented 1 year ago

Resolved by #31