To use feature transformations that operate on multiple rows, such as PCA, we run into the issue of data leakage similar to this discussion unless we can perform the transformation after splitting the data into folds.
Proposed solution is to wrap each Scikit-Learn model in a Pipeline and expose the list of "steps" to the user.
To use feature transformations that operate on multiple rows, such as PCA, we run into the issue of data leakage similar to this discussion unless we can perform the transformation after splitting the data into folds.
Proposed solution is to wrap each Scikit-Learn model in a Pipeline and expose the list of "steps" to the user.