joachim-gassen / tidycovid19

{tidycovid19}: An R Package to Download, Tidy and Visualize Covid-19 Related Data
https://joachim-gassen.github.io/tidycovid19/
Other
146 stars 44 forks source link

China data cannot be plotted #13

Closed AndreaPi closed 4 years ago

AndreaPi commented 4 years ago

First of all, let me make you the compliments for this excellent package, which should get more publicity. Secondly, as per subject:

remotes::install_github("joachim-gassen/tidycovid19")
#> Skipping install of 'tidycovid19' from a github remote, the SHA1 (04fc58e9) has not changed since last install.
#>   Use `force = TRUE` to force installation

if (!require("pacman")) install.packages("pacman")
#> Loading required package: pacman
pacman::p_load(dplyr,
               tidycovid19)

# Download latest data
updates <- download_merged_data(cached = TRUE)
#> Downloading cached version of merged data...
#> done. Timestamp is 2020-04-24 10:11:16
# Countries to highlight
countries <- "CHN"
print(plot_covid19_spread(updates,
                          highlight = countries,
                          type = "deaths",
                          per_capita = TRUE,
                          exclude_others = TRUE))
#> Warning in plot_covid19_spread(updates, highlight = countries, type =
#> "deaths", : Non-NULL 'highlight' value but no countries matched in data (Did you
#> specify correct ISO3c codes?)

Created on 2020-04-24 by the reprex package (v0.3.0)

NOTE: plot_covid19_spread complains that CHN doesn't correspond to any country Warning in plot_covid19_spread(updates, highlight = countries, type = "deaths", : Non-NULL 'highlight' value but no countries matched in data (Did you specify correct ISO3c codes?). However, this is wrong: CHN is indeed the correct ISO3c code for China, see:

https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3

https://unstats.un.org/unsd/tradekb/knowledgebase/country-code

joachim-gassen commented 4 years ago

HI there. Thanks! About your point. The reason why you are getting this warning is that the default data screen imposed by min_cases (deaths per 100,000 inhabitants exceeding 5.0) is luckily never met for China. Given Chinas population of ~1.4 Bil. that would require a death toll of around 70,000 people whereas the reported deaths are below 5,000. See:

library(tidyverse)
library(tidycovid19)

merged_data <- download_merged_data(cached = T, silent = T)
chn_deaths <- max(merged_data$deaths[merged_data$iso3c == "CHN"])
chn_population <- unique(merged_data$population[merged_data$iso3c == "CHN"])

c(chn_deaths, chn_population, 1e5*chn_deaths/chn_population)

# [1] 4.636000e+03 1.392730e+09 3.328714e-01

5*chn_population/1e5

[1] 69636.5

When you set a lower value for min_cases the graph is plotted as expected.

countries <- "CHN"
print(plot_covid19_spread(merged_data,
                          highlight = countries,
                          type = "deaths",
                          per_capita = TRUE,
                          exclude_others = TRUE,
                          min_cases = 0.1))

pic

But you have a good point regardless. The warning message could be more informative. I will add something like Non-NULL 'highlight' value but no countries matched in data (Did you specify correct ISO3c codes or do values for 'min_cases', 'min_by_ctry_obs' and/or 'edate_cutoff' lead to the exclusion of your selected countries' data?).

Would that help?

AndreaPi commented 4 years ago

Ah, good point! I indeed tried to lower min_cases down to 1 before opening the issue, but I didn't think that China has such a huge population that I would need to bring it below 1! I somehow thought that min_cases<1 wouldn't make sense, but that of course is wrong. Yes, I think making the warning message more informative would be great. Thanks!

joachim-gassen commented 4 years ago

Thanks. The more informative warning is now included in the code. Closing this.