lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
109 stars 44 forks source link

Scenario.estimate(): low accuracy of parameter estimation with SIR-F model because of short timeout #291

Closed lisphilar closed 3 years ago

lisphilar commented 3 years ago

Summary

We have low accuracy of parameter estimation with Scenario.estimate() and SIR-F model.

(Optional) Related classes

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
# Scenario analysis
snl = cs.Scenario(jhu_data, population_data, "Country name used")
snl.trend()
snl.estimate(cs.SIRF)
snl.summary()

Outputs are in #270, #271 , #272, #273 and #274 .

lisphilar commented 3 years ago

Call tree of parameter estimation is as follows.

  1. Call cs.analysis.Scenario.estimate(cs.SIRF)
  2. Scenario.estimate(cs.SIRF) calls cs.phase.phase_estimator.MPEstimator()
  3. Scenario.estimate(cs.SIRF) registers records of phases (as instances of cs.phase.phase_unit.PhaseUnit) to MPEstimator(cs.SIRF)
  4. Scenario.estimate(cs.SIRF) calls MPEstimator.run()
  5. MPEstimator.run() calls PhaseUnit.estimate() in parallel
  6. PhaseUnit.estimate() calls cs.simulation.Estimator.run(model=cs.SIRF)
  7. cs.simulation.Estimator.run(model=cs.SIRF) update parameter set (hyperparameter estimation with optuna package) while runtime < timeout (default: 60 sec) and the max values of simulated number of cases are not in allowance (0.98-1.02) of the max values of actual numbers of cases.
lisphilar commented 3 years ago

We first need to shorten runtime of trials of parameter estimation so that we run more trial within time limit (default: 1min).

lisphilar commented 3 years ago

RMSLE score calculated with np.sqrt(sklearn.metrics.mean_squared_log_error(x1, x2)) will be directory used in Estimater.error_f.

lisphilar commented 3 years ago

With #320 and version 2.10.0-mu, default value of Estimator.run(timeout) was changed from 60 to 180.

With Estimator class, CovsirPhy continues estimation while runtime does not over timeout and max simulated number of cases is not in the range of (max actual number * 0.98, max actual number * 1.02).