Currently we are referencing the pipeline using indexing. We need to change this as sometimes we use feature selection and sometimes we don't and this can affect the pipeline size. We can either change to a method where the name of the preprocessing steps are enforced by using a ColumnTransformer, or we can create a work around using if statements to detect if someone is using a feature selection method.
An example of where we have code that references with an index.
if self.imbalance_sampler:
params_no_sampler = {
key: value
for key, value in params_no_estimator.items()
if not key.startswith("Resampler__")
}
self.estimator[:-2].set_params(**params_no_sampler).fit(
X, y
)
X_valid_selected = self.estimator[:-2].transform(X_valid)
else:
self.estimator[:-1].set_params(**params_no_estimator).fit(
X, y
)
X_valid_selected = self.estimator[:-1].transform(X_valid)
This creates a problem and may even cause RFE to be done out of turn.
Currently we are referencing the pipeline using indexing. We need to change this as sometimes we use feature selection and sometimes we don't and this can affect the pipeline size. We can either change to a method where the name of the preprocessing steps are enforced by using a ColumnTransformer, or we can create a work around using if statements to detect if someone is using a feature selection method.
An example of where we have code that references with an index.
This creates a problem and may even cause RFE to be done out of turn.