lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
110 stars 44 forks source link

[Fix] [forecasting] low accuracy when backtesting because correlation of ODE parameters are not considered #786

Closed lisphilar closed 3 years ago

lisphilar commented 3 years ago

Summary

In forecasting, accuracy is quite low (i.e. predicted number of cases is far from actual values) when backtesting with Italy data and today=24Apr2021 because of correlation of ODE parameters are not considered. As failed in #778, ODE parameter values have correlation. This is related to #778 and #783.

Codes

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
oxcgr_data = data_loader.oxcgrt()
# Scenario analysis
snl = cs.Scenario(country="Italy")
snl.register(jhu_data, extras=[oxcgrt_data]))
snl.timepoints(today="24Apr2021")
snl.trend()
snl.estimate(cs.SIRF)
snl.fit(name="Forecast")
snl.predict(name="Forecast")
snl.adjust_end()
snl.history("Confirmed")
snl.history("Infected")
snl.history("Recovered")
snl.history("Fatal")

Outputs

ita_10_history_Confirmed ita_11_history_Infected ita_12_history_Recovered ita_13_history_Fatal

Environment

lisphilar commented 3 years ago

In sklearn, Decision Tree Regressor accepts multioutput (y=ODE parameter values), considering correlation of y values. We can remove MultioutputRegressor in decision tree regression. https://scikit-learn.org/stable/modules/tree.html#multi-output-problems

lisphilar commented 3 years ago

With #787, accuracy was improved especially for Confirmed/Recovered, which leads major improvement in Infected. ita_10_history_Confirmed ita_11_history_Infected ita_12_history_Recovered ita_13_history_Fatal

lisphilar commented 3 years ago

Because not completed in Confirmed/Fatal (-> Infected), we need to continue efforts in new issues.