scikit-learn / enhancement_proposals

Enhancement proposals for scikit-learn: structured discussions and rational for large additions and modifications
https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
47 stars 34 forks source link

SLEP 001: why do we need trans_modify? #33

Open amueller opened 4 years ago

amueller commented 4 years ago

cc @GaelVaroquaux

Coming back to SLEP 1 I don't see / remember the need for trans_modify. I'm now not sure why we need this. The motivation the SLEP gives is

Creating y in a pipeline makes error measurement harder For some usecases, test time needs to modify the number of samples (for instance data loading from a file)

I think that makes it much harder and I don't think it's as necessary as the training-time version.

Similarly I'm not sure I understand the motivation for partial_fit_modify.

My main motivation in this would be to distinguish training time and test time, and that only requires a new method that basically replaces fit_transform within a pipeline or other meta-estimator.

Not sure I like fit_modify for that. My thoughts right now would be forward or maybe fit_forward (though that sounds too much like feed-forward - how about feed lol). modify sounds like an in-place operation to me. D3M uses produce which is quite generic but might work (probably fit_produce, produce is their version of both predict and transform)

lorentzenchr commented 2 years ago

From the point of view of putting a model pipeline into production, only the pipe.predict(X) or pipe.predict_proba(X) are essential. Otherwise said, at prediction time you don't have y. In this regard, like @amueller , I don't see the motivation for trans_modify.

adrinjalali commented 2 years ago

As for transforming y, some fairness related methods augmenting the data do that.