lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
109 stars 44 forks source link

[New] Specify strftime format with CountryData.cleaned(date_format) when we use local dataset (Fix: Using Own Dataset Not Work Anymore) #856

Closed subi10 closed 3 years ago

subi10 commented 3 years ago

Hi, im Subi from Malaysia, thank you very much for this outstanding package and for the last month i have been using the package to upload a dataset from a province in Malaysia and it work like charm. RIght now I try to do similiar step but the "scenario" instance return error look like it didnt read my datasetsets properly.

This is now image

this is back then image

am i doing something wrong?? this is how i do it.

image image

lisphilar commented 3 years ago

Thank you for reaching out to us! Could you check country_data.cleaned() has the all data you had in the CSV file? .head() is not used in In[10], but only five rows are shown in Out[10].

Additionally, please try auto_complement=False (skip automatic data complement) when creating Scenario instance. i.e. Please replace

my_scenario = cs.Scenario(jhu_data, population_data, "Malaysia", "Selangor")

with

my_scenario = cs.Scenario(jhu_data, population_data, "Malaysia", "Selangor", auto_complement=False)

If they do not work, is it possible to share the CSV file and version number of Python and CovsirPhy? (Kindly use "Request fixing a bug" issue template at the next time.)

subi10 commented 3 years ago

Hi, Thank you very much for your fast response. I am using the latest version as I am updating the package recently. As for the last time I forget which version enables me to get the desired output. I will try your suggestion and get back to you. Attached is the Csv contained the dataset stored locally. Again, thank you very much for the fast reply. 

Best regards,Subi

On Sunday, June 27, 2021, 02:09:56 PM GMT+8, Hirokazu Takaya ***@***.***> wrote:  

Thank you for reaching out to us! Could you check country_data.cleaned() has the all data you had in the CSV file? .head() is not used in In[10], but only five rows are shown in Out[10].

Additionally, please try auto_complement=False (skip automatic data complement) when creating Scenario instance. i.e. Please replace my_scenario = cs.Scenario(jhu_data, population_data, "Malaysia", "Selangor") with my_scenario = cs.Scenario(jhu_data, population_data, "Malaysia", "Selangor", auto_complement=False) If they do not work, is it possible to share the CSV file and version number of Python and CovsirPhy? (Kindly use "Request fixing a bug" issue template at the next time.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

subi10 commented 3 years ago

Hi, I just tried the suggestion, and I get the below output

country_data has all the datasets required. Best regards,Subi

On Sunday, June 27, 2021, 02:09:56 PM GMT+8, Hirokazu Takaya ***@***.***> wrote:  

Thank you for reaching out to us! Could you check country_data.cleaned() has the all data you had in the CSV file? .head() is not used in In[10], but only five rows are shown in Out[10].

Additionally, please try auto_complement=False (skip automatic data complement) when creating Scenario instance. i.e. Please replace my_scenario = cs.Scenario(jhu_data, population_data, "Malaysia", "Selangor") with my_scenario = cs.Scenario(jhu_data, population_data, "Malaysia", "Selangor", auto_complement=False) If they do not work, is it possible to share the CSV file and version number of Python and CovsirPhy? (Kindly use "Request fixing a bug" issue template at the next time.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

lisphilar commented 3 years ago

Dear @subi10 , Thank you for your trying, but I missed images and CSVs because files are removed when we reply to GitHub Notification e-mails. Please return to GitHub Issues with your browser and attach them :-) https://github.com/lisphilar/covid19-sir/issues/856

lisphilar commented 3 years ago

You can move to GitHub Issues by clicking "view it on GitHub" link at the bottom of the notification e-mails.

キャプチャ
subi10 commented 3 years ago

Hi,

Sorry for sending it via email. I did try the suggestion and I get this ,

1624776855508blob

attached is the file of the dataset im working with. Selangor.xlsx

subi10 commented 3 years ago

Hi,  Thank you for your guidance. I have made the comments on github and attached the said dataset. Hope that you received it. Best regards,Subi On Sunday, June 27, 2021, 03:07:30 PM GMT+8, Hirokazu Takaya @.***> wrote:

You can move to GitHub Issues by clicking "view it on GitHub" link at the bottom of the notification e-mails.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lisphilar commented 3 years ago

Thank you for uploading! Hmm...I tried it with CovsirPhy 2.21.0 (the latest stable version), CSV file converted from the Excel file you attached and Google Colab. Actually, it worked. https://gist.github.com/lisphilar/e7697ae512bdb7220c4bccbf6c2beeb7

I noticed the first date of the records you showed in the first comment was 2020-01-05 and column names of the CSV file was "Confirmed", "Recovered" and "Death". However, the excel file I received has 2020-04-20 at the first record. Column names were "confirmed", "recovered" and "fatal".

subi10 commented 3 years ago

Hi,, Thank you for trying, however when i ran on my pc it still give the same result. Maybe due to environment error? I will try to use in other pc and get back to you.  Best regards,Subi On Sunday, June 27, 2021, 03:41:27 PM GMT+8, Hirokazu Takaya @.***> wrote:

Thank you for uploading! Hmm...I tried it with CovsirPhy 2.21.0 (the latest stable version), CSV file converted from the Excel file you attached and Google Colab. Actually, it worked. https://gist.github.com/lisphilar/e7697ae512bdb7220c4bccbf6c2beeb7

I noticed the first date of the records you showed in the first comment was 2020-01-05 and column names of the CSV file was "Confirmed", "Recovered" and "Death". However, the excel file I received has 2020-04-20 at the first record. Column names were "confirmed", "recovered" and "fatal".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lisphilar commented 3 years ago

I’m not sure, but do you have CSV file and Excel file in the directory where codes were executed? If so, please confirm the files have the same first date, 2020-04-20, and column names are ”confirmed", "recovered" and "fatal".

subi10 commented 3 years ago

Hi there, Thank you very much again, however, I tried to run in macbook using google collab the problem still persist. Best regards,Subi On Sunday, June 27, 2021, 03:41:27 PM GMT+8, Hirokazu Takaya @.***> wrote:

Thank you for uploading! Hmm...I tried it with CovsirPhy 2.21.0 (the latest stable version), CSV file converted from the Excel file you attached and Google Colab. Actually, it worked. https://gist.github.com/lisphilar/e7697ae512bdb7220c4bccbf6c2beeb7

I noticed the first date of the records you showed in the first comment was 2020-01-05 and column names of the CSV file was "Confirmed", "Recovered" and "Death". However, the excel file I received has 2020-04-20 at the first record. Column names were "confirmed", "recovered" and "fatal".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

subi10 commented 3 years ago

Hi,

These are the files in my directory. I did change it to small letter thinking that it can fix my issue.

image

subi10 commented 3 years ago

Yes,

The first date is indeed 20-4-2020

image

subi10 commented 3 years ago

I still get this, it throw me -1 number

image

lisphilar commented 3 years ago

Could you share Selangor_S-R.ipynb?

subi10 commented 3 years ago

Hi,  Thank you very much. Sure, please find attached the requested file. However, when I try to upload in Github the file cant be load at the comment section. Best regards,Subi On Sunday, June 27, 2021, 04:20:29 PM GMT+8, Hirokazu Takaya @.***> wrote:

Could you share Selangor_S-R.ipynb?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

subi10 commented 3 years ago

Hi,

I upload it here

https://github.com/subi10/Selangor

subi10 commented 3 years ago

Also, I pass the link of the collab u give to me to my colleague and ask to run it they also got the same error.

lisphilar commented 3 years ago

Thank you for creating the repository! Sorry for the trouble.

I noticed that the last date was "2021-06-14" in CSV and that was "2021-12-06" (future date!) in Out[4] (country_data.cleaned().tail()). I will investigate it with source codes.

Could you add the following lines to the script?

print(covsirphy.__version__)
country_data._raw.tail()

If the output of country_data._raw.tail() is not the same as Out[4], something is wrong with data cleaning.

subi10 commented 3 years ago

This is head and tail in the country_data

image

subi10 commented 3 years ago

This is the version i currently on

image

subi10 commented 3 years ago

Yes, look like the tail here has something issue with the last date.

lisphilar commented 3 years ago

Thank you for sharing. It appears that "12/6/2021" is converted to "2021-12-06" (=06Dec2021) in your PC. Apart from CovsirPhy, please share the output of the next codes.

import pandas as pd
pd.to_datetime("12/6/2021")

My PC (in Japan) returns Timestamp('2021-12-06 00:00:00').

subi10 commented 3 years ago

Same goes here, my start date is April 20th 2020

image

subi10 commented 3 years ago

Oh, this is the timestamp

image

lisphilar commented 3 years ago

This is expected to be Timestamp('2021-06-12 00:00:00')... To fix this issue, we may need to set time format appropriately.

import pandas as pd
pd.to_datetime("12/6/2021", format="%d/%m/%Y")
subi10 commented 3 years ago

I try to run it

image

lisphilar commented 3 years ago

The reason Google Colab successed is not clear...but, to test it, could you try the following?

import pandas as pd
# Remove cleaned data with wrong time format
country_data._cleaned_df = pd.DataFrame()
# Update raw dataframe with appropreate time format
country_data._raw["Date"] = pd.to_datetime(country_data._raw["Date"], format="%d/%m/%Y")
# Data cleaning
country_data.cleaned()
subi10 commented 3 years ago

Yes,, Now it worked. Thank you so much!! Best regards,Subi On Sunday, June 27, 2021, 05:02:24 PM GMT+8, Hirokazu Takaya @.***> wrote:

This is expected to be Timestamp('2021-06-12 00:00:00')... To fix this issue, we may need to set time format appropriately. import pandas as pd pd.to_datetime("12/6/2021", format="%d/%m/%Y") — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

subi10 commented 3 years ago

Great stuff!! Thank you so much. !!

image image

lisphilar commented 3 years ago

Thank you for your cooperation!!! I will add time_format date_format argument to CountryData.cleaned() later.

subi10 commented 3 years ago

Thank you very much to you as well. You are a genius and great person.

lisphilar commented 3 years ago

With #857, CountryData.cleaned(date_format=None) (default) was implemented at development 2.21.0-delta. This will be included in the next stable version 2.22.0 (planed in Jul2021). Becuase we only use "date", argument name is date_format, not time_format.

For a while, please use the code (country_data._raw["Date"] = pd.to_datetime(country_data._raw["Date"], format="%d/%m/%Y")) with the latest stable version. Or, use country_data.cleaned(date_format="%d/%m/%Y") with the development version.

New documentation will be deployed in some hours. https://lisphilar.github.io/covid19-sir/markdown/INSTALLATION.html#use-a-local-csv-file-which-has-the-number-of-cases

I will close this issue, thank you.

FYI: With issue #851, LocalDataLoader may be created to read local datasets more easily. Date format should be considered there.

subi10 commented 3 years ago

Hi, Thank you very much for your effort! Best regards,Subhi On Sunday, June 27, 2021, 06:10:25 PM GMT+8, Hirokazu Takaya @.***> wrote:

With #857, CountryData.cleaned(date_format=None) (default) was implemented at development 2.21.0-delta. This will be included in the next stable version 2.22.0 (planed in Jul2021). Becuase we only use "date", argument name is date_format, not time_format.

For a while, please use the code (country_data._raw["Date"] = pd.to_datetime(country_data._raw["Date"], format="%d/%m/%Y")) with the latest stable version. Or, use country_data.cleaned(date_format="%d/%m/%Y") with the development version.

New documentation will be deployed in some hours. https://lisphilar.github.io/covid19-sir/markdown/INSTALLATION.html#use-a-local-csv-file-which-has-the-number-of-cases

I will close this issue, thank you.

FYI: With issue #851, LocalDataLoader may be created to read local datasets more easily. Date format should be considered there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lisphilar commented 3 years ago

Dear @subi10, Hello again. Stable version 2.22.0 was released and DataLoader class was improved. https://lisphilar.github.io/covid19-sir/markdown/LOADING.html

With 2.22.0, we can use DataLoader to read local CSV files without CountryData.

import covsirphy as cs
loader = cs.DataLoader(update_interval=None)
loader.read_csv("Selangor.csv", parse_dates=["Date"], dayfirst=True)
# loader.local
loader.assign(country="Malaysia", state="Selangor", population=6_530_000)
loader.lock(
    date="Date", country="country", province="state",
    confirmed="confirmed", fatal="fatal", recovered="recovered", population="population")
# loader.locked
jhu_data = loader.jhu()
snl = cs.Scenario(country="Malaysia", province="Selangor")
snl.register(jhu_data)
snl.records()
subi10 commented 3 years ago

Dear Hirokazu Takaya, I just got the access to my mail and see this!, Thank you very much for your effort to solve the issue, really appreciate it. I just been trying to find how to use the package to get the model of vaccination and reinfection, If you can show me the quickest walk through on how can I do that. Malaysian government releases the dataset on the vaccination and how may I use this to incorporate it into the model. Really interested to find out.  I saw there is two models can fit my need but is there any way to use SIRD with vaccination and can you show the simplest walkthrough (the example of python code), I try before with no luck.

Thank you so much for everything. 

Best wishes,Subhi

covid19-public/epidemic at main · MoH-Malaysia/covid19-public

|

covid19-public/epidemic at main · MoH-Malaysia/covid19-public

Official data on the COVID-19 epidemic in Malaysia. Powered by CPRC, CPRC Hospital System, MKAK, and MySejahtera... |

|

|

On Saturday, July 31, 2021, 09:41:35 PM GMT+8, Hirokazu Takaya ***@***.***> wrote:  

Dear @subi10, Hello again. Stable version 2.22.0 was released and DataLoader class was improved. https://lisphilar.github.io/covid19-sir/markdown/LOADING.html

With 2.22.0, we can use DataLoader to read local CSV files without CountyData. import covsirphy as cs loader = cs.DataLoader(update_interval=None) loader.read_csv("Selangor.csv", parse_dates=["Date"], dayfirst=True)

loader.local

loader.assign(country="Malaysia", state="Selangor", population=6_530_000) loader.lock( date="Date", country="country", province="state", confirmed="confirmed", fatal="fatal", recovered="recovered", population="population")

loader.locked

jhu_data = loader.jhu() snl = cs.Scenario(country="Malaysia", province="Selangor") snl.register(jhu_data) snl.records() — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.