koaning / scikit-lego

Extra blocks for scikit-learn pipelines.
https://koaning.github.io/scikit-lego/
MIT License
1.28k stars 117 forks source link

Relax `check_X_y` for `DecayEstimator` #580

Closed FBruzzesi closed 1 year ago

FBruzzesi commented 1 year ago

Description

Relaxes the check for X in DecayEstimator by delegating the actual checks to the given estimator. This fixes #573 and allows estimators that do not need X features. Remark, as discussed in the issue, it may be worth it to completely remove the check.

Type of change

Checklist:

koaning commented 1 year ago

I'm also OK with removing the entire check. Mainly because the internal method should already be doing that and this way we'll need to make fewer assumptions.

FBruzzesi commented 1 year ago

I'm also OK with removing the entire check. Mainly because the internal method should already be doing that and this way we'll need to make fewer assumptions.

To be completely pedantic, the check_X_y function call checks and converts X and y to arrays. This allows to then access .shape attribute to know how many samples are there and compute the decay.

Now both approaches feel wrong and with shortcomings:

Opinions?

koaning commented 1 year ago

Not having a check doesn't allow to access .shape or len in the general case (I noticed this by running unit tests) and I would be unsure how to proceed.

Ah, I forgot about that. That's a fair point. I guess the simplest solution is to offer a flag to the end-user and to defer the responsibility of applying the check to them? That said, I think most of the time X will be .shape-able or len()-able so the default could be True. Or am I missing something?

FBruzzesi commented 1 year ago

I guess the simplest solution is to offer a flag to the end-user and to defer the responsibility of applying the check to them?

That's reasonable!

That said, I think most of the time X will be .shape-able or len()-able so the default could be True

With such default the scikit-learn tests would fail. I will add a big warning in the docstring 😁

koaning commented 1 year ago

With such default the scikit-learn tests would fail. I will add a big warning in the docstring 😁

There are a lot of components in this library that fail the standard checks. That's ok. It's not ideal, but given how "non-standard" some of these ideas are ... it's fine to turn a test off sometimes. Adding a warning to the docstring is fair though!