Is your feature request related to a problem? Please describe.
I would like to be able to use cuML to work on time series problems, especially ones that require train/test data splits that are time series focused. To do this I need to use the sklearn TimeSeriesSplit object.
Describe the solution you'd like
I would like a cuML equivalent of the TimeSeriesSplit class that is available in sklearn that can be used directly as part of a cuML Pipeline object and with the cross_val_score method.
Describe alternatives you've considered
I could reimplement this myself from scratch, but this would be error prone and generally risky as poor time series splitting behavior is a common source of data leakage in ML problems.
Additional context
I have an SVR model that takes ~8 minutes to train per split. I have a dataset that is ~1 million observations and I need to train across thousands of splits in this dataset meaning my runtime is 8 min * N where N is large. On a system with a 7950X3D with 32 processes and 32 GB 6000 MT/S RAM cross validation using the TimeSeriesSplit ran for more than 5 days. Running the SVR model on the same system using cuML with an RTX 3090 decreased per split training to less than 30 seconds (via WSL2). However, I cannot completely migrate to cuML without the TimeSeriesSplit implementation.
Is your feature request related to a problem? Please describe. I would like to be able to use cuML to work on time series problems, especially ones that require train/test data splits that are time series focused. To do this I need to use the sklearn TimeSeriesSplit object.
Describe the solution you'd like I would like a cuML equivalent of the TimeSeriesSplit class that is available in sklearn that can be used directly as part of a cuML Pipeline object and with the cross_val_score method.
Describe alternatives you've considered I could reimplement this myself from scratch, but this would be error prone and generally risky as poor time series splitting behavior is a common source of data leakage in ML problems.
Additional context I have an SVR model that takes ~8 minutes to train per split. I have a dataset that is ~1 million observations and I need to train across thousands of splits in this dataset meaning my runtime is 8 min * N where N is large. On a system with a 7950X3D with 32 processes and 32 GB 6000 MT/S RAM cross validation using the TimeSeriesSplit ran for more than 5 days. Running the SVR model on the same system using cuML with an RTX 3090 decreased per split training to less than 30 seconds (via WSL2). However, I cannot completely migrate to cuML without the TimeSeriesSplit implementation.