Selector to avoid filtering out masked samples when fitting transformer

scikit-adaptation / skada

Domain adaptation toolbox compatible with scikit-learn and pytorch

BSD 3-Clause "New" or "Revised" License

60 stars 16 forks source link

This is an addition to the functionality implemented in #123.

The question here is the following:

pipe = make_da_pipeline(StandardScaler(), SubspaceAlignmentAdapter(), LogisticRegression())
pipe.fit(X=X_train, y=y_train, sample_domain=sample_domain)

Assuming y_train is properly masked. Now

SubspaceAlignmentAdapter gets everything because it declares sample_domain in the routing.
LogisticRegression gets only sources (can't work with sample_domain)
StandardScaler ???

This PR makes it so StandardScaler gets both sources and targets, as fit does not require labels. It previously worked this way, and it seems like this is a much stronger default. For non default behavior, we still can wrap the transformer into a proper selector when those are ready (see #116).

Let me know WDYT.

Codecov Report

Attention: Patch coverage is 95.83333% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 92.02%. Comparing base (ba065a8) to head (2a9774f).

Files	Patch %	Lines
skada/tests/test_selector.py	94.44%	1 Missing :warning:

Files

Patch %

Lines

skada/tests/test_selector.py

94.44%

1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #129 +/- ## ========================================== + Coverage 83.07% 92.02% +8.95% ========================================== Files 43 43 Lines 3485 3500 +15 ========================================== + Hits 2895 3221 +326 + Misses 590 279 -311 ```

scikit-adaptation / skada

Selector to avoid filtering out masked samples when fitting transformer #129

Codecov Report