microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.85k stars 506 forks source link

official example: Time Series Forecast ValueError: list.remove(x): x not in list #390

Closed SharkFin-top closed 2 years ago

SharkFin-top commented 2 years ago

Hello!

python 3.9.5

https://microsoft.github.io/FLAML/docs/Examples/AutoML-Time%20series%20forecast

when I run Univariate time series and Multivariate time series, it return ValueError: list.remove(x): x not in list

Univariate time series

import numpy as np
from flaml import AutoML

X_train = np.arange('2014-01', '2021-01', dtype='datetime64[M]')
y_train = np.random.random(size=72)
automl = AutoML()
automl.fit(X_train=X_train[:72],  # a single column of timestamp
           y_train=y_train,  # value for each timestamp
           period=12,  # time horizon to forecast, e.g., 12 months
           task='ts_forecast', time_budget=15,  # time budget in seconds
           log_file_name="ts_forecast.log",
          )
print(automl.predict(X_train[72:]))

output

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_5700/1886056256.py in <module>
      5 y_train = np.random.random(size=72)
      6 automl = AutoML()
----> 7 automl.fit(X_train=X_train[:72],  # a single column of timestamp
      8            y_train=y_train,  # value for each timestamp
      9            period=12,  # time horizon to forecast, e.g., 12 months

D:\Miniconda3\lib\site-packages\flaml\automl.py in fit(self, X_train, y_train, dataframe, label, metric, task, n_jobs, gpu_per_trial, log_file_name, estimator_list, time_budget, max_iter, sample, ensemble, eval_method, log_type, model_history, split_ratio, n_splits, log_training_metric, mem_thres, pred_time_limit, train_time_limit, X_val, y_val, sample_weight_val, groups_val, groups, verbose, retrain_full, split_type, learner_selector, hpo_method, starting_points, seed, n_concurrent_trials, keep_search_state, early_stop, append_log, auto_augment, min_sample_size, use_ray, **fit_kwargs)
   2147                 if TS_FORECAST == self._state.task:
   2148                     # catboost is removed because it has a `name` parameter, making it incompatible with hcrystalball
-> 2149                     estimator_list.remove("catboost")
   2150                     try:
   2151                         import prophet

ValueError: list.remove(x): x not in list

Multivariate time series

import statsmodels.api as sm

data = sm.datasets.co2.load_pandas().data
# data is given in weeks, but the task is to predict monthly, so use monthly averages instead
data = data['co2'].resample('MS').mean()
data = data.fillna(data.bfill())  # makes sure there are no missing values
data = data.to_frame().reset_index()
num_samples = data.shape[0]
time_horizon = 12
split_idx = num_samples - time_horizon
train_df = data[:split_idx]  # train_df is a dataframe with two columns: timestamp and label
X_test = data[split_idx:]['index'].to_frame()  # X_test is a dataframe with dates for prediction
y_test = data[split_idx:]['co2']  # y_test is a series of the values corresponding to the dates for prediction

from flaml import AutoML

automl = AutoML()
settings = {
    "time_budget": 10,  # total running time in seconds
    "metric": 'mape',  # primary metric for validation: 'mape' is generally used for forecast tasks
    "task": 'ts_forecast',  # task type
    "log_file_name": 'CO2_forecast.log',  # flaml log file
    "eval_method": "holdout",  # validation method can be chosen from ['auto', 'holdout', 'cv']
    "seed": 7654321,  # random seed
}

automl.fit(dataframe=train_df,  # training data
           label='co2',  # label column
           period=time_horizon,  # key word argument 'period' must be included for forecast task)
           **settings)

output

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_5700/3846975636.py in <module>
     25 }
     26 
---> 27 automl.fit(dataframe=train_df,  # training data
     28            label='co2',  # label column
     29            period=time_horizon,  # key word argument 'period' must be included for forecast task)

D:\Miniconda3\lib\site-packages\flaml\automl.py in fit(self, X_train, y_train, dataframe, label, metric, task, n_jobs, gpu_per_trial, log_file_name, estimator_list, time_budget, max_iter, sample, ensemble, eval_method, log_type, model_history, split_ratio, n_splits, log_training_metric, mem_thres, pred_time_limit, train_time_limit, X_val, y_val, sample_weight_val, groups_val, groups, verbose, retrain_full, split_type, learner_selector, hpo_method, starting_points, seed, n_concurrent_trials, keep_search_state, early_stop, append_log, auto_augment, min_sample_size, use_ray, **fit_kwargs)
   2147                 if TS_FORECAST == self._state.task:
   2148                     # catboost is removed because it has a `name` parameter, making it incompatible with hcrystalball
-> 2149                     estimator_list.remove("catboost")
   2150                     try:
   2151                         import prophet

ValueError: list.remove(x): x not in list

I really dont know why o(╥﹏╥)o thank you guys!

sonichi commented 2 years ago

That's a bug introduced by a recent PR: #362. It should be an easy fix: check whether "catboost" is in the estimator_list before removing it. Feel free to create a PR if you would like to.

@int-chaos

int-chaos commented 2 years ago

That's a bug introduced by a recent PR: #362. It should be an easy fix: check whether "catboost" is in the estimator_list before removing it. Feel free to create a PR if you would like to.

@int-chaos

will do

sonichi commented 2 years ago

@int-chaos could you also tell @SharkFin-top an easy workaround before the PR is merged? Such as passing an estimator_list which contains "catboost". BTW, Bugs like this are detectable in a notebook example.