Closed YanisLalou closed 5 months ago
I think the pipeline
treats differently the last step of a pipeline and the steps before. That's why adding LogisticRegression
fixes the problem, I think. When the adapter is the last step, the pipeline
function seems to remove the negative values (i.e., the target) in sample_domain
, but I don't know why.
The transform removes target values because it 'prepares' the output for the next step, which is supposed to be an estimator that doesn't have the access to targets (as we can't fit to masked labels).
The selector, not the transformer, sorry for the confusion. We can prevent this from happening by trying to guess what type of estimator is used. But I'm not sure how deep we want to go with that.
We could just check if the estimator has a 'transform' method --> we don't remove the negative values. If the estimator has a 'predict' method --> we remove the negative values.
Yes, this is how default sklearn pipeline makes the distinction.
Using this code snippet:
You'll get this error:
ValueError: Found array with 0 sample(s) (shape=(0, 800)) while a minimum of 1 is required by StandardScaler.
Also its worth noting that by adding a
LogisticRegression
at the end of the pipeline, the error magically disappears.