Closed AnujTiwari closed 3 years ago
Thank you for creating issues!
My aim is to index all 3142 US counties as per the impact of COVID-19. I can request the local health department for clinical datasets.
"COVID-19 Hub" (our main data source for JHUData
etc.) provides state-level records for US. You can confirm this as follows.
import covsirphy as cs
loader = cs.DataLoader()
jhu_data = loader.jhu()
jhu_data.layer(country="US").Province.unique()
If you have the county-level dataset as a CSV file in your local environment, you can create JHUData
instance via CountryData
class. Please refer to Use a local CSV file which has the number of cases and regard "county" as "province" when you are using covsirphy library.
- What are the minimum parameters required to perform the county-level COVID-19 impact assessment using CovsirPhy.
If I understand this question correctly, you are asking about the required columns of the county-level dataset. You need the following data.
- If I can collect county-level vaccination and variant details, is it possible to utilize them with the current model.
At the latest version 2.21.0, it is not implemented. We need some updates of classes.
- What is the best output parameter (like R0) we can use for comparative evaluation of COIVD-19 impact and indexing counties.
Yes, we can use R0 as the index, but R0 is calculated with parameter values of ODE models. Please refer to Usage: SIR-derived models and I recommend to use the parameter values directly as the index.
Thank you lisphilar for clearing this up for me. Regarding vaccination and variant details, whether you are working on the updates? If I can include at least the vaccination dataset that will be a great success for me.
Could you provide the details of the dataset you have? I'd like to write a new class as the handler, testing it with the dataset, if available.
I plan to create new class LocalDataLoader
tentative name) to handle datasets saved in local environment. Usage may be similar to DataLoader
class.
Note for updating codes:
When we create LocalDataLoader
class, we should add date_format
argument because dates are sometimes parsed incorrectly as discussed in #856.
Could you provide the details of the dataset you have? I'd like to write a new class as the handler, testing it with the dataset, if available.
- column names
- some rows
- file extension (CSV?)
- if opened, URL
I plan to create new class
LocalDataLoader
tentative name) to handle datasets saved in local environment. Usage may be similar toDataLoader
class.
Hi lisphilar I am working on collecting all the required datasets. Soon update them here for further processing.. Thanks.
Hi lisphilar, I am in the process of getting the dataset from local health departments. I apologize for the long delay. Meanwhile, please let me know how to export the intermediate maps (snl.trend().summary()) in high resolution .tif image format. Thank you
@AnujTiwari ,
Thank you for letting me know. With some pull requests linked to this issue, I'm preparing to add new features to DataLoader
class (LocalDataLoader
will not be created).
Meanwhile, please let me know how to export the intermediate maps (snl.trend().summary()) in high resolution .tif image format. Thank you
When we use CovsirPhy with Jupyter Notebook, you can export TIFF files as follows.
snl.interactive = False
snl.trend(filename="image_filename.tiff", dpi=300)
When we use it with Python scripts, snl.interactive = False
is un-necessary.
snl.trend()
(and all methods which create figures) calls matplotlib.pyplot.savefig()
internally. Keyword arguments of matplotlib.pyplot.savefig()
, including dpi=300
, can be used as the keyword arguments of snl.trend()
. If not, that may be a bug.
Hi lisphilar, As per your suggestions, I have implemented cook county datasets (confirmed, fatal, population and recovered=0). Everything went smooth and I was capable of generating results for cook county with S-R trend analysis phases. But I have some doubts:
Again, thanks a lot for your help. I am still waiting for getting health data from local health departments. Keep you updated with the developments, But definitely, I am getting vaccination information (%population received 1st and 2nd dose) and looking forward to incorporating vaccination in the reproduction estimation.
Dataset: date | province | confirmed | fatal | recovered 1/24/2020 | Cook | 1 | 0 | 0 1/25/2020 | Cook | 1 | 0 | 0 . . 2/18/2021 | Cook | 467986 | 9775 | 0 2/19/2021 | Cook | 468690 | 9790 | 0 2/20/2021 | Cook | 469379 | 9800 | 0
Code: country_data = cs.CountryData("cook county.csv", country="Cook") country_data.set_variables( date="date", confirmed="confirmed", recovered="recovered", fatal="fatal")
country_data.register_total() country_data.cleaned()
jhu_data_cook = cs.JHUData.from_dataframe(country_data.cleaned()) population_data_cook = cs.PopulationData() population_data_cook.update(5150233, country="Cook")
snl = cs.Scenario(country="Cook") snl.register(jhu_data_cook, population_data_cook) snl.records()
snl.trend(algo="Pelt-rbf").summary() snl.estimate(cs.SIRF)
S-R Phases: 0th phase (15Mar2020 - 24Apr2020): finished 281 trials in 0 min 9 sec 1st phase (25Apr2020 - 18May2020): finished 320 trials in 0 min 10 sec 2nd phase (19May2020 - 08Jul2020): finished 385 trials in 0 min 13 sec 3rd phase (09Jul2020 - 17Aug2020): finished 437 trials in 0 min 15 sec 4th phase (18Aug2020 - 23Sep2020): finished 321 trials in 0 min 10 sec 5th phase (24Sep2020 - 20Oct2020): finished 180 trials in 0 min 5 sec 6th phase (21Oct2020 - 04Nov2020): finished 239 trials in 0 min 7 sec 7th phase (05Nov2020 - 14Nov2020): finished 270 trials in 0 min 8 sec 8th phase (15Nov2020 - 24Nov2020): finished 389 trials in 0 min 13 sec 9th phase (25Nov2020 - 05Dec2020): finished 322 trials in 0 min 10 sec 12th phase (05Jan2021 - 20Jan2021): finished 181 trials in 0 min 5 sec 10th phase (06Dec2020 - 17Dec2020): finished 564 trials in 0 min 22 sec 11th phase (18Dec2020 - 04Jan2021): finished 348 trials in 0 min 11 sec 13th phase (21Jan2021 - 20Feb2021): finished 831 trials in 0 min 28 sec
For New Change Points: snl.clear(include_past=True).summary() snl.add(start_date="28Apr2020")
Error: ValueError: @start must be the same as/over 15Mar2020, but 27Feb2020 was applied.
- My dataset is starting from 24Jan2020 but my S-R trend first phases was starting from 15Mar2020?
Yes. We cannot apply S-R trend analysis to records with R = 0. This is because we cannot calculate parameter "a" of log(S(R)) = -a*R + logN when R is always 0 e.g. (R, S) = (0, 1000), (0, 999), (0, 998)...(0, 980).
So, all records with R = 0 should be ignored when S-R trend analysis. However, zeros in your dataset are not actual values. Because you did not have actual values of R, zeros were registered as I proposed.
When R values appears not to be actual values, covsirphy.Scenario
complements R values via covsirphy.JHUDataComplementHandler
class internally. We can check complemented R values with snl.records(variables="R")
. This returns a dataframe with dates and complemented R values except for zeros. I guess, minimum date of the returned dataframe was 15Mar2020.
- When I am trying to replace S-R trend analysis phases with some new change points following instructions at usage:phases, I am getting an error: ValueError: @start must be the same as/over 15Mar2020, but 27Feb2020 was applied. Am I missing something here?
Please replace snl.add(start_date="28Apr2020")
with snl.add(end_date="28Apr2020")
to specify the end date, not start date, of the new phase.
code: country_data = cs.CountryData("cook county.csv", country="Cook") country_data.set_variables( date="date", confirmed="confirmed", recovered="recovered", fatal="fatal")
country_data.register_total() country_data.cleaned()
jhu_data_cook = cs.JHUData.from_dataframe(country_data.cleaned()) population_data_cook = cs.PopulationData() population_data_cook.update(5150233, country="Cook")
snl = cs.Scenario(country="Cook") snl.register(jhu_data_cook, population_data_cook) snl.records()
snl.trend(algo="Pelt-rbf").summary() snl.estimate(cs.SIRF)
snl.clear(include_past=True).summary() snl.add(end_date="28Apr2020") ValueError: @start must be the same as/over 15Mar2020, but 27Feb2020 was applied.
Hmm...This seems an internal error. The first date might not be changed correctly when zeros were removed.
snl.first_date
?snl.timepoint(first_date="15Mar2020"); snl.add(end_date="28Apr2020")
solve the problem?Output of snl.first_date - 27Feb2020 - I am confused with this date as confirmed cases data is starting from 24th Jan 2020 and fatal cases data is starting from 17th Mar 2020..
snl.timepoints(first_date="15Mar2020"); snl.add(end_date="28Apr2020") is showing the error: ValueError: @start must be the same as/over 01Apr2020, but 15Mar2020 was applied.
Cook County Data.zip Please find the data for your reference.
Thank you for providing the file and I will try to find the cause.
With pull request #881, I updated codes to fix the problem. Could you retry your script with the latest development version 2.21.0-kappa?
Please replace the current stable version with the development version in your environent or try development version in new environment.
pip install --upgrade "git+https://github.com/lisphilar/covid19-sir.git#egg=covsirphy"
At the latest development verion 2.21.0-kappa-fu2, we can use local datasets with DataLoader
class as follows. "recovered" column will not be required in the CSV file if you do not know the actual number of recovered cases.
import covsirphy as cs
# Create DataLoader instance
# Set update_interval=None to skip downloading datasets from remote servers
loader = cs.DataLoader(update_interval=None)
# Read local CSV files: use pandas.read_csv() internally
loader.read_csv("input/cook county.csv", parse_dates=["date"], dayfirst=False)
# Check the dataframe
print(loader.local)
# Set necessary columns: country name and population
# with pandas.DataFrame.assign() intenally
loader.assign(country="US", population=5150233)
# Check the dataframe
print(loader.local)
# Lock data by specifing column names to use
loader.lock(date="date", country="country", province="province", confirmed="confirmed", fatal="fatal", population="population")
# Check locked data
print(loader.locked)
# Create JHUData instance
jhu_data = loader.jhu()
# Start scenario analysis
snl = cs.Scenario(country="US", province="Cook")
snl.register(jhu_data)
snl.records()
Kindly let me know if you need to read multiple CSV files and they have vaccination data.
Hi lisphilar, I am still getting the same error when I am trying to add a new change point using snl.add(end_date="28Apr2020") Dataset (an instance of dataset showing initial values):
Code and Errror:
I also checked with the standard code and it is working fine (below) but with the extract CSV file (local dataset) it is showing the same error.
As far as I can understand it is the problem in the 0th phase. Please chk
I request you to look into this strange thing happening in the code:
Thank you for your notice and I also confirmed there is something error with first date setting after full complement of the number of recovered cases. https://gist.github.com/lisphilar/1063a31147d50dff252e682673b1e600
I will investigate it and try to fix the problem.
By msitake, full-complement was done recurrently when recovered = 0 and timepoint settings were updated. With #884, I tried to fix this problem. The correct first date was 10Feb2020, not 15Mar2020. https://gist.github.com/lisphilar/8f05eee0249571a5a27b2af64b5b2ce2
Please use the latest development version 2.21.0-kappa-fu3.
That's great - Thanks lisphilar for the immediate fix and all the hard work.
Hi lisphilar, Is it possible to predict COVID-19 prevalence (symptomatic patients + asymptomatic patients) using the current model?
Also, Today I computed R0 for Vietnam and when I compared my results (below) with the epiforecasts Reproduction number [https://epiforecasts.io/covid/posts/national/vietnam/] I found a significant difference between the outcomes of these models. May be you can better differentiate the modeling approaches here.
They will be discussed with #887 and #888.
Sure, thanks
Oh Sorry, by mistake I clicked on the closed issue.
Hi Lisphilar,
Currently, I am using loader.read_csv("Cook_County.csv", parse_dates=["date"], dayfirst=False) for reading the individual csv files. As, I am fetching cases, fatal, recovered, and population data records from a master CSV file that has the entry for hundreds of counties I would like to replace Pandas dataframe with the CSV file above.
My DataFrame is same as the earlier CSV files I discussed above (cook county.csv). Thanks
No worries, thank you for reopening this.
Could you try the followings and share the output of loader.local
?
loader.read_csv("cook_county.csv")
loader.read_csv("master CSV filename", how_combine="merge", how="left", on="province")
print(loader.local)
This merges dataframes with pandas.merge()` internally.
Oops, I might misunderstood the situation. Please share column names of the master file. The master file has county/date/confirmed/fatal/recovered/population for all counties and the indivisual files has country/date/confirmed?
Is the difference between cook.csv and the master file whether populaton data is included or not?
(Updated)
No there is no difference between the master file and data frame. Data frame also has same attributes (county/date/confirmed/fatal/recovered/population).
If column name of population data is "population" and recovered data is "recovered", we can do as follows. We can skip loader.assign(population=5150233)
. Please replace "master file name" with the master filename,
import covsirphy as cs
loader = cs.DataLoader(update_interval=None)
loader.read_csv("master file name", parse_dates=["date"], dayfirst=False)
loader.assign(country="US")
loader.lock(date="date", country="country", province="province", confirmed="confirmed", fatal="fatal", recovered="recovered", population="population")
Sorry for the miscommunication..I thought you wanted to know about the structure of the DataFrame.
No, actually I am using multiple CSVs to fetch these parameters into the data frame. so I would like to just replace this read_CSV function with something like readDataFrame?
Thank you. OK, we have a large pandas.DataFrame
which includes many counties' data.
We can call loader.read_csv(filename, how_combine="concat")
multiple times to concat many CSV file data, but loader.read_dataframe()
will be more useful as you metioned. Please wait for a while.
After merging #893, please update covsirphy
to version 2.21.0-lambda-fu1 and please relace loader.read_csv("master file name", parse_dates=["date"], dayfirst=False)
with loader.read_dataframe(dataframe, parse_dates=["date"], dayfirst=False)
if the pandas.DataFrame
is assigned to dataframe
variable.
Great - I think it is one of the important updates that will be helpful for the generalisation of the library. Thanks, Lisphilar.. It is pleasure working with you ..
Hi Lisphilar,
Which trend analysis technique S-R trend is using? Is it possible to access the trend type, p-value, z-value, slope, and other parameters related to the trend results? Also, is it possible to collect the resultant parameters for homogeneity analysis?
This will be discussed in #894.
@AnujTiwari ,
Thank you for your continuous supports and discussion.
I'm writing documentation of these new features (DataLoader.read_csv()
, DataLoader.read_dataframe()
and so on) at docs/markdown/INSTALLATION.md of this repository. "Data loading (To Be Released)" chapter will be used. After deployment, this will be shown at https://lisphilar.github.io/covid19-sir/markdown/INSTALLATION.html
Noe that "Dataset preparation" chapter will be integrated to the new chapter after the next stable release 2.22.0.
Please let me know (or create pull requests to edit it) if you have some advice.
thank you Lisphilar.. Sure I ll look into the Chapter and provide my inputs.
Hi Lisphilar,
Today I was executing the model for Barbour County and I found a "-" value in the output R0. So far I have implemented the model for less than 10 counties and this is the first time getting such strange reproduction no value.
Please find the Dataset: Barbour.zip
While if I run snl.trend() rather than manually adding the change points everything works fine..
Please share the out put of snl.estimate_accuracy("1st")
. This is a normal behavior when the records in the phase fit to SIR-F model in-sufficiently.
while executing snl.estimate_accuracy("1st") I am receiving error message: Please execute ODEHandler.add() in advance. I think I am missing Somthing.
Was it called after snl.add()
and snl.estimate()
?
Yes - It was called after snl.add() and snl.estimate()
Could you share the all script via gist or GitHub repository?
I tried codes with records as-of 18Jul2021. https://gist.github.com/lisphilar/a4396dc91a25af30ccda0155be2c0214
None value of Rt:
The records in the phases fit to SIR-F model in-sufficiently and kappa was 0 in the two phases and sigma was 0 in the 1st phase. Because Rt is equal to "rho * (1 - theta) / (kappa + sigma)", None was returned. None was converted to "-" with snl.summary()
and snl.get()
.
Error with "snl.estimate_accuracy(): I also confirmed the error when we run it without
snl.trend()`. When some parameter values are 0, this error may be raised. Could you create a new issue for this with "Request fixing a bug" template?
In case when the records in the phase fit to SIR-F model in-sufficient and R0 is None '-', what do you suggest for getting some significant R0 value for the county - whether I should increase the time period and chk for 21 days instead of 14 days or something else ?
Also I am a little concerned about the very high R0 values like 13.59 (for Barbour county) after adding a new record (19th Day records in case of Barbour county). It seems like a small increase in COVID-19 cases/fatal (cases increased by count 2 in Barbour county) is causing a major shift in R0 value.
Because the phase does not fit to the model, I recommend to shorten the phase e.g. 7 days.
However, actually I need the outputs of snl.estimate_accuracy()
for confirmation and deciding the number of days. This method compares simulated number with estimated parameter values and actual number of cases.
Thank you for creating #895 and I will investigate it with high priority after work on 21Jul2021 JST.
thank you for the suggestion. Yes- it will be great if it is possible to compute 'minimum number of the days' phase (from the last day) for which records in the phase sufficiently fit to SIR-F model and provide a significant R0 value.. Thanks
there are many counties where the number of cases and deaths is comparatively less model throws an error. I request you to please test the model with the Wilcox county dataset. Similar to many other counties, I am not able to get the outcome for this county.
import pandas as pd loader = cs.DataLoader(update_interval=None) loader.read_csv("Wilcox1.csv", parse_dates=["date"], dayfirst=False)
print(loader.local)
loader.assign(country="US", province="Wilcox", population=10373)
print(loader.local) loader.lock(date="date", country="country", province="province", confirmed="confirmed", fatal="fatal", population="population") print(loader.locked)
jhu_data = loader.jhu()
snl = cs.Scenario(country="US", province="Wilcox") snl.register(jhu_data) snl.records()
snl.clear(include_past=True) snl.add(end_date="12Jul2021") snl.add(end_date=snl.today) snl.summary()
snl.estimate(cs.SIRF) snl.summary()
ValueError: When the targets have multiple columns, we cannot select RMSLE.
Hi All, Please help me to utilize the CovsirPhy for US County Level COVID-19 Impact Assessment. My aim is to index all 3142 US counties as per the impact of COVID-19.
I can request the local health department for clinical datasets. Please help me to understand the following points:
I am very much new to the domain of compartmental modeling so, please help me to understand if I am leaving some important details required to extend the model for the county level.
Thanks