lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
109 stars 44 forks source link

[Revise] [forecasting] use MAPE as the default evaluation metrics and prevent overfitting #813

Closed lisphilar closed 3 years ago

lisphilar commented 3 years ago

Summary

Decision tree regressor for forecasting shows overfitting. Train score (0.857) >> Test score (-0.670) in the following example. (Italy, today=06Jun2021)

Codes

example/scenario_analysis.py or

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
# Scenario analysis
snl = cs.Scenario(country="Italy")
snl.register(jhu_data)
snl.trend()
snl.estimate(cs.SIRF)
fit_dict = snl.fit()
del fit_dict["dataset"], fit_dict["intercept"], fit_dict["coef"]
print(fit_dict)

Outputs

{'best': 'Indicators -> Parameters with Decision Tree Regressor', 'pca': <class 'sklearn.decomposition._pca.PCA'>, 'regressor': <class 'sklearn.tree._classes.DecisionTreeRegressor'>, 'pca_n_components': 1, 'score_metric': 'R2', 'score_train': 0.8566862200815691, 'score_test': -0.6701488680815267, 'delay': 10}

Environment

lisphilar commented 3 years ago

I tried "regressor__max_depth": [1, 2, 3, 4, 5, 6, 7, 8, 9] as a hyperparameter of DecisionTreeRegressor, but R2 score was decreased. (before: [3, 5, 7, 9])

{'best': 'Indicators -> Parameters with Decision Tree Regressor', 'pca': <class 'sklearn.decomposition._pca.PCA'>, 'regressor': <class 'sklearn.tree._classes.DecisionTreeRegressor'>, 'pca_n_components': 1, 'score_metric': 'R2', 'score_train': 0.8691580023243415, 'score_test': -1.449940666224947, 'delay': 16}

lisphilar commented 3 years ago

At version 2.20.3-alpha with Italy data and today="06Jun2021",

{'best': 'Indicators -> Parameters with Decision Tree Regressor', 'pca': <class 'sklearn.decomposition._pca.PCA'>, 'regressor': <class 'sklearn.tree._classes.DecisionTreeRegressor'>, 'pca_n_components': 1, 'score_metric': 'R2', 'score_train': 0.09405397649269268, 'score_test': -3.233491027704678, 'delay': [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}

Using "regressor__max_depth": list(range(1, 10))` as the grid of max depth of decision tree regressor,

{'best': 'Indicators -> Parameters with Decision Tree Regressor', 'pca': <class 'sklearn.decomposition._pca.PCA'>, 'regressor': <class 'sklearn.tree._classes.DecisionTreeRegressor'>, 'pca_n_components': 1, 'score_metric': 'R2', 'score_train': -3.084686247190538, 'score_test': -2.8895351215217864, 'delay': [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}

818 Using "regressor__max_depth": list(range(1, 10))` as the grid of max depth of decision tree regressor and min-max scaler just before PCA and decision tree regrression,

{'best': 'Indicators -> Parameters with Decision Tree Regressor', 'scaler': <class 'sklearn.preprocessing._data.MinMaxScaler'>, 'pca': <class 'sklearn.decomposition._pca.PCA'>, 'regressor': <class 'sklearn.tree._classes.DecisionTreeRegressor'>, 'pca_n_components': 8, 'score_metric': 'R2', 'score_train': 0.918263219376156, 'score_test': -1.2331265904202067, 'delay': [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}

lisphilar commented 3 years ago

Current status for Italy/06Jun2021 ita_07_fit_plot

lisphilar commented 3 years ago

With #826, add light gradient boosting machine regressor for forecasting with lightgbm package. https://github.com/microsoft/LightGBM/tree/master/python-package

lisphilar commented 3 years ago

In RegressorBase._split(), backword filling (pandas.DataFrame.bfill()) was done with delayed indicator values to keep the number of train/test data.

X_delayed = X.copy()
for delay in delay_values:
    X_delayed = X_delayed.join(X.shift(delay, freq="D"), how="outer", rsuffix=f"_{delay}")
X_delayed = X_delayed.ffill().bfill()

However, this lead over fitting because un-existed combination of parameter values and indicators were created. This should be like this. (pull request #827)

X_delayed = X.copy()
for delay in delay_values:
    X_delayed = X_delayed.join(X.shift(delay, freq="D"), how="outer", rsuffix=f"_{delay}")
X_delayed = X_delayed.ffill() # .bfill()

With #827, ita_07_fit_plot

{'best': 'Indicators -> Parameters with Decision Tree Regressor', 'converterto_convert': False, 'pca__n_components': 0.9, 'regressormax_depth': 6, 'converter': <class 'covsirphy.regression.reg_rate_converter._RateConverter'>, 'scaler': <class 'sklearn.preprocessing._data.MinMaxScaler'>, 'pca': <class 'sklearn.decomposition._pca.PCA'>, 'regressor': <class 'sklearn.tree._classes.DecisionTreeRegressor'>, 'score_metric': 'R2', 'score_train': 0.9835082445767591, 'score_test': -0.3647405411172536, 'delay': [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}

We need to continue updating, but somewhat improved with this change.

lisphilar commented 3 years ago

WIth #837, example/scenario_analysis.py will be updated to include vaccine data as X.

lisphilar commented 3 years ago

I thought test score R-squared << 0 is due only to lower performance of regression models, but it was not correct. We should change the default metric to evaluate performance of regression models more effectively.

Quated from https://link.medium.com/yZsE4RoIdhb

The R-Squared formula compares our fitted regression line to a baseline model. This baseline model is considered the “worst” model. The baseline model is a flat-line that predicts every value of y will be the mean value of y. R-Squared checks to see if our fitted regression line will predict y better than the mean will.

Y (ODE parameter) values of train+test dataset may show multimodal distributions. Let's say Y values of train dataset show multimodal distributions with mean value 0.3, but Y values of test dataset show unimodal distributions (i.e. subsetted from one peak) with mean value 0.1 with small variance.

Train: R-squared will be positive because the diferences of mean value 0.3 and samples are large. Baseline model is worse as expected. This leads "MSE(model)" will be smaller than "MSE(baseline)".

Test: R-squared will be negative because the diferences of mean value 0.1 and samples are small. Baseline model good un-expectedlly. This leads "MSE(model)" will be much larger than "MSE(baseline)".

Please refer to https://link.medium.com/yZsE4RoIdhb https://link.medium.com/iGKo8W4Ldhb

lisphilar commented 3 years ago

For a while, MAPE (mean absolute percentage error) score will be used as the evaluation metric because this is scale-independent and easy to interpret.

lisphilar commented 3 years ago

With #839,

ita_07_fit_plot

lisphilar commented 3 years ago

The figure and scores (MAPE: 59.40% for training, 58.74% for test) indicates that ovefitting is not a problem. Underfitting will be the next problem.