providing CustomIterator to cv_iter in tabular_automl.fit_predict fails

mjvakili commented 2 years ago

I would like to pass a custom cross validation to cv_iter in tabular_automl.fit_predict or automl.fit_predict. I run into this error: AssertionError: Pipeline finished with 0 models for some reason.

to give you some context, I have my own custom cv class which has a split method (just like any sklearn cv) that yields train_indices , valid_indices for each split and I pass it to tabular_automl.fit_predict() in the following way:

from lightautoml.validation.base import CustomIterator
cv_splitter = cv.split(tr_data)
custom_iterator = CustomIterator(tr_data, cv_splitter)
tabular_automl.fit_predict(tr_data, roles={'target' : TARGET_NAME}, cv_iter = custom_cv_iterator)

I run into the same error if I use:

tabular_automl.fit_predict(tr_data, roles={'target' : TARGET_NAME}, cv_iter = cv_splitter)

Do you know what may have caused this error and how to fix it? Thanks!

alexmryzhkov commented 2 years ago

Hi @mjvakili,

thanks for your feedback.

The iterators we use for crossvalidation differs from the sklearn varaints. To build your own custom one please take a look for FoldsIterator and TimeSeriesIterator in the lightautoml.validation.np_iterators

Alex

mjvakili commented 2 years ago

Hi @alexmryzhkov, Thanks. I had a look at TimeSeriesIterator in the lightautoml.validation.np_iterators and it seems to be the cv_iterator I was looking for. However, when I use TimeSeriesIterator and look at OOF predictions, the last entries of OOF predictions seem to be NaN. I would have expected the first entries (the first fold) to be NaN for OOF predictions.

alexmryzhkov commented 2 years ago

Hi @mjvakili,

You are right - the first one should be nans. But the predictions could be also nans if there is too little timeout to finish the predictions. Could you please share the training log for your case?

Alex

github-actions[bot] commented 2 years ago

Stale issue message

sberbank-ai-lab / LightAutoML

providing CustomIterator to cv_iter in tabular_automl.fit_predict fails #104