joachim-gassen / tidycovid19

{tidycovid19}: An R Package to Download, Tidy and Visualize Covid-19 Related Data
https://joachim-gassen.github.io/tidycovid19/
Other
146 stars 44 forks source link

Function download_owid_data downloads just OWID's COVID testing data, not all OWID COVID data #36

Open DannyQuah opened 3 years ago

DannyQuah commented 3 years ago

Small issue with writeup and maybe naming, not necessarily the underlying code: In Readme.md, the new function download_owid_data() has its description beginning "Downloads and tidies data collected by the ‘Our World in Data’ team". The second sentence says specifically "This team systematically collects data on hospitalizations, testing and vaccinations from multiple national sources." The first statement might slightly mislead as really this function is confined to downloading just COVID testing data, not the entire range of other COVID data that OWID reports. (Scanning the R code confirms this.) Small suggestion is that a fix might either (a) remove the select and filter statements, and edit the description; (b) rename the function to make clearer it's confined to just covid testing data.

Or perhaps, is the function still work in progress to something larger?

Many thanks for all the excellent and valuable work this REPO represents.

joachim-gassen commented 3 years ago

Sorry for the delay and thank you for your kind words! Could it be the case that your version of {tidycovid19} trails the current version? I updated the code a while ago so that it imports OWID data on vaccinations, testing and hospitalizations. Maybe you are using an interim version that had a documentation update but no code updates. Can you do:

library(tidycovid19)
df <- devtools::package_info()
df[df$package == "tidycovid19", ]

to see on which version you are on? The current version is joachim-gassen/tidycovid19@4f65abc . Using it the code downloads the above mentioned data items.

df <- download_owid_data()
names(df)
[1] "iso3c"              "date"               "total_tests"        "tests_units"        "positive_rate"     
[6] "hosp_patients"      "icu_patients"       "total_vaccinations" "timestamp" 

If this is your problem then updating remotes::install_github("joachim-gassen/tidycovid19") would do the trick. If this does not solve your problem then I would very much like to hear more about it.

Thank you!

DannyQuah commented 3 years ago

Joachim - Thanks for your very informative reply. I'm not on exactly the latest version (but will update ASAP); however, my version does reproduce the same variables you listed, and the blurb that comes on is exactly accurate:

_The Our World in Data team systematically collects data on Covid-19 testing, hospitalizations, and vaccinations from multiple national sources. Data points are collected with varying frequency across countries. The definition on what consitutes a 'test' varies, reflected by the variable 'testsunits' in the data frame. The vaccination data is currently only available based on ad hoc disclosures by a small set of countries. The column 'timestamp' reports the time the data was downloaded from its authoritative source

What I'm looking for with OWID, however, is broader than these 9 variables. Instead what I want is what I see on the website itself, i.e. including a whole load of other variables. I reckon that since no one else has raised this issue with you, for other users, it really is just the "testing, hospitalisations, vaccinations" data they're after as well. In my own clumsily hacked-together function for my own use, here's what I looked for and was able to get from OWID:

theOWID.dt <- dl_owid_covid_data(cached = FALSE, silent = FALSE, readOnline = TRUE)
Downloading Our World in Data COVID data...Done downloading Our World in Data COVID data

names(theOWID.dt)
 [1] "iso_code"                              "continent"                            
 [3] "location"                              "date"                                 
 [5] "total_cases"                           "new_cases"                            
 [7] "new_cases_smoothed"                    "total_deaths"                         
 [9] "new_deaths"                            "new_deaths_smoothed"                  
[11] "total_cases_per_million"               "new_cases_per_million"                
[13] "new_cases_smoothed_per_million"        "total_deaths_per_million"             
[15] "new_deaths_per_million"                "new_deaths_smoothed_per_million"      
[17] "reproduction_rate"                     "icu_patients"                         
[19] "icu_patients_per_million"              "hosp_patients"                        
[21] "hosp_patients_per_million"             "weekly_icu_admissions"                
[23] "weekly_icu_admissions_per_million"     "weekly_hosp_admissions"               
[25] "weekly_hosp_admissions_per_million"    "new_tests"                            
[27] "total_tests"                           "total_tests_per_thousand"             
[29] "new_tests_per_thousand"                "new_tests_smoothed"                   
[31] "new_tests_smoothed_per_thousand"       "positive_rate"                        
[33] "tests_per_case"                        "tests_units"                          
[35] "total_vaccinations"                    "people_vaccinated"                    
[37] "people_fully_vaccinated"               "new_vaccinations"                     
[39] "new_vaccinations_smoothed"             "total_vaccinations_per_hundred"       
[41] "people_vaccinated_per_hundred"         "people_fully_vaccinated_per_hundred"  
[43] "new_vaccinations_smoothed_per_million" "stringency_index"                     
[45] "population"                            "population_density"                   
[47] "median_age"                            "aged_65_older"                        
[49] "aged_70_older"                         "gdp_per_capita"                       
[51] "extreme_poverty"                       "cardiovasc_death_rate"                
[53] "diabetes_prevalence"                   "female_smokers"                       
[55] "male_smokers"                          "handwashing_facilities"               
[57] "hospital_beds_per_thousand"            "life_expectancy"                      
[59] "human_development_index"               "timestamp"                            

Again, it seems like I'm the rare user interested in using all these other OWID variables. No one else seems to have an issue with getting just the "testing, hospitalisation, vaccination" data that your download_owid_data recovers.

Once more, many many thanks for the amazing work you've done with this.

joachim-gassen commented 3 years ago

Ah, I see. Thanks for clarifying this. The reason is why I decided to limit the OWID data to hospitalization, testing and vaccination is that on principle, I would like to use authoritative data providers whenever possible. OWID itself is more of a data aggregator/distributor.

Many of the data items that you list above are derivative data from original data (e.g., new_tests_per_thousand etc.) or are imported from other sources (for example the Worldbank data). Thus, importing the full OWID dataframe into the data provided by the package would by some extent create duplicate data items.

I will leave this issue open for a while to see whether there is a general appetite to import more data items from the OWID data.

Thanks!

DannyQuah commented 3 years ago

You're right. I too have wondered about the duplication across aggregators (and of course others besides OWID do that). In a way of course it's inevitable. There are only so many different series to go around. Also, mostly it's a matter of taste and direction. For what I'm doing, OWID happens to aggregate the specific data I find most useful, and so if I have to go one source, it's OWID I go to, even though I know a lot of its data are found elsewhere. But looking to the latter would mean a lot more pulling from different URLs whereas with OWID (and again this is specific to what I want to do, others will differ) that particular aggregator has most of what I need.

No worries, if this discussion helps anyone else out there, they'll note what's going on. Thanks again.