lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
109 stars 43 forks source link

[Bug] Error when the number of Fatal cases are not changing with SIR-D model #1486

Open karanjogi opened 11 months ago

karanjogi commented 11 months ago

Checkbox

Summary

Error while creating a subset for SIRFmodel using eng.subset() for a specific 'geo' I am trying to create a subset for Illinois. However the sample code for Japan also didn't work at province and city level.

Reproducible example script

import covsirphy as cs
cs.__version__

eng = cs.DataEngineer()
eng.download(country='USA', province='Illinois', databases=["japan", "covid19dh"]);

eng.clean()
eng.transform()
actual_df, status, _ = eng.subset(geo=("USA", "Illinois", "Cook"), variables="SIRF", complement=True)
print(status)
actual_df.tail()

The current outputs

SubsetNotFoundError                       Traceback (most recent call last)
Cell In[39], line 3
      1 eng.clean()
      2 eng.transform()
----> 3 actual_df, status, _ = eng.subset(geo=("USA","Illinois",), variables="SIRF", complement=True)
      4 print(status)
      5 actual_df.tail()

File ~/miniconda3/envs/covid/lib/python3.11/site-packages/covsirphy/engineering/engineer.py:432, in DataEngineer.subset(self, geo, start_date, end_date, variables, complement, get_dummies, **kwargs)
    376 """Return subset of the location and date range.
    377 
    378 Args:
   (...)
    429     Re-calculation of Susceptible and Infected will be done automatically.
    430 """
    431 v_converted = self._var_alias.find(name=variables, default=variables)
--> 432 subset_df = self._gis.subset(geo=geo, start_date=start_date, end_date=end_date, variables=None, errors="raise")
    433 if not complement:
    434     df = subset_df.set_index(self.DATE)

File ~/miniconda3/envs/covid/lib/python3.11/site-packages/covsirphy/gis/gis.py:293, in GIS.subset(self, geo, start_date, end_date, variables, errors)
    291 df = manager.filter(data=data, geo=geo_converted)
    292 if df.empty and errors == "raise":
--> 293     raise SubsetNotFoundError(geo=geo)
    294 # Filter with date
    295 series = df[self._date].copy()

SubsetNotFoundError: No records in Illinois/USA were found.

Expected outputs

No errors

Environment

- CovsirPhy version: 3.0.0
- Python version: 3.11.4

Package manager (required if installation issue)

pip

Platform (required if installation issue)

MacOS

Additional Context

No response

lisphilar commented 11 months ago

Thank you for your bug report! However, I failed in reporducing the error with Google Colab and the same codes. https://gist.github.com/lisphilar/e25b5ac4d1ac9dcea0f383ead0a4957e

Does the following scpript return empty dataframe? (If yes, error is in downloading. If no, error is in subsetting.)

import covsirphy as cs
cs.__version__

eng = cs.DataEngineer()
eng.download(country='USA', province='Illinois', databases=["japan", "covid19dh"]);
eng.clean()
eng.transform()

df = eng.all()
df.loc[df['City'] == 'Cook']
karanjogi commented 11 months ago

Thanks you for replying promptly! Yes it does return a dataframe. I think I found the issue with my code. I didn't set the layer for the dataengineer

import covsirphy as cs cs.version

eng = cs.DataEngineer(['ISO3', 'Province', 'City'], complement=True) eng.download(country='USA', province='Illinois', databases=["japan", "covid19dh", "owid"]) eng.clean() eng.transform() eng.layer(geo=("USA", "Illinois",))

actualdf, status, = eng.subset(geo=("USA", 'Illinois', 'Cook'), variables="SIRF", complement=True) print(status) actual_df.tail()

This gives a subset on city level. However the sample code doesn't add any layers. Therefore eng.subset is acting funny if you provide geo=('Japan', 'Tokyo', ) or any other combination of city and province.

lisphilar commented 11 months ago

I may not have understood the point correctly, but cs.DataEngineer() is the same as cs.DataEngineer(layers=None) and cs.DataEngineer(layers=[“ISO3”, “Province”, “City”]). Is this making you confused?

karanjogi commented 11 months ago

I think I was able to solve the confusion.

I am trying to create a phase dependent model (SIRD) for a city in USA.

It seems that value of kappa is 0 for some phases. I have tried different algo for segments and other arguments too. But still it is 0. Can you suggest a work around.

Colab link below: https://colab.research.google.com/drive/1dFKxUWd6L7ek_x1cognX49sChrE5142e#scrollTo=MD7lhKDtG6ZK

lisphilar commented 10 months ago

Hi @karanjogi I'm sorry for my very late reply.

I performed additional analysis and found that the number of fatal cases were not changed in the phases with kappa = 0. https://gist.github.com/lisphilar/d717d73bcb1f5d3e80e79db9bbee4796

If the number of fatal cases is correct, kappa = 0 should be accepted. i.e. We need to fix the ZeroDivisionError by returning NA when kappa = 0.

karanjogi commented 10 months ago

Hi @lisphilar ,

Thank you for clearing what the problem is!

Can you please suggest a workaround this error? Or where the code change needs to be done? I can give it a try and submit a PR if solved.

Thanks!

lisphilar commented 10 months ago

Thank you for your contribution!

Regarding SIR-D model, the error is raised here. Could you add if statements in "try" block? https://github.com/lisphilar/covid19-sir/blob/5eb55efc36db79f302c951878d8bd45bf7e6e6dc/covsirphy/dynamics/sird.py#L138-L158

The other models (sir.py, sirf.py, sewirf.py) in the directory also have the same issues.

Additionally, please add tests (with pytest) to https://github.com/lisphilar/covid19-sir/blob/5eb55efc36db79f302c951878d8bd45bf7e6e6dc/tests/test_dynamics/test_model.py so that we can avoid the similar issues.

I look forward to our discussion.

lisphilar commented 10 months ago

And could you add a title to this issue?

karanjogi commented 10 months ago

Sure! I have made the changes in a new branch named develop and will be creating a PR for it. Could you please review it and let me know if there are any changes needed?

lisphilar commented 10 months ago

Thank you got your pull request! As I commented on that, rho_i = 0 and sigma = 0 could be acceptable as well as kappa = 0.