Closed kurucan closed 2 years ago
Yes, you can set split_type="time"
.
https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#data-split-method
Thanks but sklearn time series function have additional parameters like "gap", I do not see the additional params in the defined flaml function;
sklearn time series split function: class sklearn.model_selection.TimeSeriesSplit(n_splits=5, *, max_train_size=None, test_size=None, gap=0)
https://github.com/microsoft/FLAML/blob/main/flaml/automl.py elif self._split_type == "time":
if self._state.task == TS_FORECAST:
period = self._state.fit_kwargs["period"]
if period * (n_splits + 1) > y_train_all.size:
n_splits = int(y_train_all.size / period - 1)
assert n_splits >= 2, (
f"cross validation for forecasting period={period}"
f" requires input data with at least {3 * period} examples."
)
logger.info(f"Using nsplits={n_splits} due to data size limit.")
self._state.kf = TimeSeriesSplit(n_splits=n_splits, test_size=period)
else:
**self._state.kf = TimeSeriesSplit(n_splits=n_splits)**
@kurucan https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#data-split-method You can use a custom splitter with your desired "gap" etc. Welcome to add a test case like in https://github.com/microsoft/FLAML/blob/2f5d6169d3b5cc025eb2516cbd003fced924a88e/test/automl/test_split.py#L158
@slhuang
Many thanks! It works.
Is there a support for sklearn.model_selection.TimeSeriesSplit
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html