lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
109 stars 44 forks source link

[Question] new database #1048

Closed elmedianikhadija closed 2 years ago

elmedianikhadija commented 2 years ago

hello, please, i want to use new database for my analysis, how i can do it ?

lisphilar commented 2 years ago

Hello @elmedianikhadija , Did you try DataLoader.read_csv() and DataLoader.read_dataframe()? https://lisphilar.github.io/covid19-sir/markdown/LOADING.html

geeky-programer commented 2 years ago

I am using the functionality of DataLoader.read_csv(). If this procedure is followed there is no cleaning function that can be used on the pandas' object. And there is no inclusion of the infected column in data.load phase and there is a problem using the ExampleData(). function

lisphilar commented 2 years ago

Dear @geeky-programer , Please use df = pd.read_csv(); (data cleaning); DataLoader().read_dataframe(df) at this time.

Note that JHUData class (parent class of ExampleData) calculates Infected = Confirmed - Recovered - Fatal automatically and internally. Could you provide me with the details of the problem?

With #1064, I plan to create DataEngineer class, which handles all of data reading, data cleaning and calculation of Infected.

geeky-programer commented 2 years ago

The output of both of these df = pd.read_csv(); (data cleaning); DataLoader().read_dataframe(df),Does not contain the Infected column and the data is not cleaned properly.

I have followed the steps as described in https://lisphilar.github.io/covid19-sir/markdown/LOADING.html and the steps for loading from CSV files. There maybe a bug, please check the workflow.

Are there any other functions or classes I can make use of to create good data to feed the models at this point in time?

Thank you very much for your response. And great effort on the compartmental models.

lisphilar commented 2 years ago

Thank you for your response and could you share the codes and CSV file (or some lines with the column names of the data)?

geeky-programer commented 2 years ago

`import pandas as pd pip install --upgrade "git+https://github.com/lisphilar/covid19-sir.git#egg=covsirphy" import covsirphy as cs'

N_germany = 83240000 URL = 'https://gitlab.uni-koblenz.de/akshaygs/ann/-/raw/main/Data%20set/SIR_data.csv'

reading the csv into a dataframe

df_new = pd.read_csv(URL,delimiter = ";",index_col=False)

df_new['Confirmed'] = df_new['Infected'] + df_new['Deaths'] + df_new['Recoveries'] df_new['Susceptible'] = N_germany - df_new['Confirmed']

R means recovered + fatal

df_new['Recovered-new'] = df_new['Deaths'] + df_new['Recoveries'] df_new['Population'] = int(83240000) df_new.drop("Entry", axis=1, inplace=True) df_new.drop("Recoveries", axis = 1, inplace = True)

loader = cs.DataLoader(update_interval = None)

loader.read_dataframe(df_new, parse_dates = ["Date"],dayfirst = "25Feb2020")

loader.assign( country="Germany", province = "Germany" ) new_data = loader.lock(

Always required

date="Date", country="country", province="province",confirmed="Confirmed", fatal="Deaths", population="Population",

#date="Date",confirmed="Confirmed", fatal="Deaths", population="population",

#Optional
recovered="Recovered-new",

) data = loader.locked

if you print 'data' dataframe there is no column for infections and there are many NaN values which is not part of data cleaning i guess

this is the line that is throwing error

cis_data = cs.ExampleData(data,tau=1440, start_date="25Feb2020" )

KeyError: 'Expected columns were not included in clean_df with ISO3, Province, Date, Country, Confirmed, Fatal, Recovered, Population, Tests, Product, Vaccinations, Vaccinations_boosters, Vaccinated_once, Vaccinated_full, School_closing, Workplace_closing, Cancel_events, Gatherings_restrictions, Transport_closing, Stay_home_restrictions, Internal_movement_restrictions, International_movement_restrictions, Information_campaigns, Testing_policy, Contact_tracing, Stringency_index, Mobility_grocery_and_pharmacy, Mobility_parks, Mobility_transit_stations, Mobility_retail_and_recreation, Mobility_residential, Mobility_workplaces. Infected must be included.'`

geeky-programer commented 2 years ago

Screenshot 2022-06-29 at 09 06 28

lisphilar commented 2 years ago

Thank you for the details! Just to confirm, did you try DataLoader().jhu() instead of ExampleData(data)? Because you have actual records of COVID-19 and an instance of JHUData returned by DataLoader().jhu() calculates Infected internally.

geeky-programer commented 2 years ago

Yes, I have tried the DataLoader().jhu(). The data gathered is quite impressive. I am a data science student, so I am loading different data for academic purposes. Can I know when you plan on implementing DataEngineer class?

lisphilar commented 2 years ago

With #1090, I'm writing some new classes, including DataEngineer. I didn't have much time to update the pull request these days, but I plan to merge it this week or next Saturday/Sunday.

geeky-programer commented 2 years ago

Ok. Much appreciate the work. Thank you

lisphilar commented 2 years ago

Dear @geeky-programer , Sorry for the delay, but I'm preparing documentations of the class DataEngineer (available with only development versions at this time). After completiong of writing notebooks, I will release the next stable version and update documentation (GitHub page).