knightlab-analyses / regression-benchmarking

BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

Introduce feature scaling/transformations before every CV fold #13

Closed patrickimran closed 4 years ago

patrickimran commented 4 years ago

To use feature transformations that operate on multiple rows, such as PCA, we run into the issue of data leakage similar to this discussion unless we can perform the transformation after splitting the data into folds.

Proposed solution is to wrap each Scikit-Learn model in a Pipeline and expose the list of "steps" to the user.