lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
109 stars 44 forks source link

[Fix] Estimator for Spain, France throws runtime error #500

Closed Inglezos closed 3 years ago

Inglezos commented 3 years ago

Summary

Upon running estimator for Spain and France, the code execution ends with runtime error: ValueError: Mean Squared Logarithmic Error cannot be used when targets contain negative values.

Code

import covsirphy as cs

data_loader = cs.DataLoader(directory="kaggle/input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()

spa_scenario = cs.Scenario(jhu_data, population_data, "Spain")
spa_scenario.records()
_ = spa_scenario.trend()
spa_scenario.estimate(cs.SIRF)
spa_scenario.add(name="Main", days=7)
spa_sim_df = spa_scenario.simulate(name="Main")

fra_scenario = cs.Scenario(jhu_data, population_data, "France")
fra_scenario.records()
_ = fra_scenario.trend()
fra_scenario.estimate(cs.SIRF)
fra_scenario.add(name="Main", days=7)
fra_sim_df = fra_scenario.simulate(name="Main")

Traceback (for Spain):

Traceback (most recent call last):
  File "C:\Users\ingle\Anaconda3\envs\py38\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\ingle\Anaconda3\envs\py38\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "D:\COVID19\Python\lisphilar\covid19-sir\cInglezos\covid19-sir\covsirphy\phase\phase_estimator.py", line 116, in _run
    unit.estimate(record_df=record_df, **kwargs)
  File "D:\COVID19\Python\lisphilar\covid19-sir\cInglezos\covid19-sir\covsirphy\phase\phase_unit.py", line 442, in estimate
    estimator.run(**kwargs)
  File "D:\COVID19\Python\lisphilar\covid19-sir\cInglezos\covid19-sir\covsirphy\simulation\estimator.py", line 120, in run
    self.study.optimize(
  File "C:\Users\ingle\Anaconda3\envs\py38\lib\site-packages\optuna\study.py", line 333, in optimize
    self._optimize_sequential(...)
...

Environment

lisphilar commented 3 years ago

For Spain, jhu_data.subset("Spain") did not show negative values in infected records, but jhu_data.subset_comlement("Spain")[0] shows negative infected values. This may be the cause for Spain.

Inglezos commented 3 years ago

For Spain issue, this is resolved with pull request #523. The negative infected values were caused during monotonic complement of the Confirmed variable. For early low values (1-2), the complement changed these values to 0.99 for example and the int64 transformation cut the value down to 0, while the recovered remained 1, and thus the negative value due to such roundups. I applied ceil() first instead. The France issue cause was the known raw data problem with Jan2021 records written by mistake to Jan20 records.

lisphilar commented 3 years ago

Thanks to your pull request, this bug was fixed!