lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
109 stars 44 forks source link

Make input.sh compatible with all OSes (re-write input.sh using python) #26

Closed ilyasst closed 4 years ago

ilyasst commented 4 years ago

Is your feature request related to a problem? Please describe. Currently, input.sh works for Ubuntu (it might work on MacOS if SVN is available but I did not test it), however it can definitely not be used for Windows.

Describe the solution you'd like input.sh could be written in python which would make it possible to execute it using any OS as long as the python environment is properly setup.

lisphilar commented 4 years ago

Dear @ilyasst , Thank you very much for your proposal and pull request! input.py is very useful and the script was successfully marged to master branch!

lisphilar commented 4 years ago

Dear @ilyasst , As the next step, I plan to create a Python class CovsirPhy.DataLoader. This will download the datasets automatically and show the citations of the datasets.

For the users who are not Kaggers,

  1. The number of cases (Global): directory download JHU data
  2. The number of cases in Japan: will be discussed in #17
  3. Total population: will be discussed in #29
  4. OxCGRT: GitHub repository as the previous versions

(Kaggle users can download them manually with input.py.)

I will create a pull request for "1. The number of cases" later.

lisphilar commented 4 years ago

Dear @ilyasst , covsirphy.cleaning.data_loader.DataLoader was created for automatic data downloading of JHU/Japan/OxcGRT data. (Data loader of population dataset is pending now. #29 )

Please kindly comfirm it with the default branch. (Version 2.2.5) Example codes are as follows.

import covsirphy as cs
# Set the directory to save the datasets
data_loader = cs.DataLoader("input")
# JHU dataset
jhu_data = data_loader.jhu()
print(jhu_data.citation)
jhu_data.cleaned()
# The number of cases in Japan
japan_data = data_loader.japan()
print(japan_data.citation)
jhu_data.replace(japan_data)
ncov_df = jhu_data.cleaned()
# OxCGRT dataset
oxcgrt_data = data_loader.oxcgrt()
print(oxcgrt_data.citation)
oxcgrt_df = oxcgrt_data.cleaned()
jpn_oxcgrt_df = oxcgrt_data.subset(iso3="JPN")

input.py was also updated.

ilyasst commented 4 years ago

I have pulled the code from master and followed the For developers guide. I had no problems with the installation, I was also able to download the JHU dataset and OxCGRT datasets with the method you provided above with no problems.

I was also able to use input.py to download all the datasets only when the kaggle.json file was stored in ~/.kaggle. It is not possible to simply put the kaggle.json file in the same folder as input.py because a modification in https://github.com/lisphilar/covid19-sir/commit/f83738672e9721ab211a1ab93b84c8a387630040 . It is necessary to set the OS environment variable "KAGGLE_CONFIG_DIR" before loading KaggleApi library otherwise it will fail to detect the kaggle.json file.

I will submit a PR for this shortly.

lisphilar commented 4 years ago

Dear @ilyasst , Thank you for your pull request. I merged it.

However, I don't recommend keeping kaggle.json in your working directory for a security reason. It may cause leak of your API keys accidentally. I plan to stop using Kaggle API because we can replace Kaggle datasets (secondary data) with datasets provided by primary sources. Kagglers can import Kaggle datasets to their Kaggle notebooks with GUI and load the datasets with local_file argument.

data_loader = DataLoader(directory=None)
jhu_data = data_loader.jhu(local_file="kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv")
japan_data = data_loader.japan(local_file="kaggle/input/covid19-dataset-in-japan/covid_jpn_total.csv")

Currenly, OxCGRT dataset in Kaggle is provided as EXCEL file. We need to convert it to CSV file. https://www.kaggle.com/paultimothymooney/oxford-covid19-government-response-tracker

The difference of primary/Kaggle datasets will be adjust using covsirphy.cleaning sub-module. What do you think about this?

I will add DataLoader.population method and update README.md within several days.

lisphilar commented 4 years ago

Dear @ilyasst , Please confirm that data loader of population dataset was included with the default branch.

import covsirphy as cs
# Set the directory to save the datasets
data_loader = cs.DataLoader("input")
# Population in each country
population_data = data_loader.population()

README.md was also updated. Thank you.

lisphilar commented 4 years ago

Because this change was applied, I will close this issue. Thank you.