joachim-gassen / tidycovid19

{tidycovid19}: An R Package to Download, Tidy and Visualize Covid-19 Related Data
https://joachim-gassen.github.io/tidycovid19/
Other
146 stars 44 forks source link

I'm not able to plot Taiwan data #20

Closed AndreaPi closed 4 years ago

AndreaPi commented 4 years ago

This may just be me making some mistake, but:

remotes::install_github("joachim-gassen/tidycovid19")
#> Skipping install of 'tidycovid19' from a github remote, the SHA1 (56a8bd32) has not changed since last install.
#>   Use `force = TRUE` to force installation

if (!require("pacman")) install.packages("pacman")
#> Loading required package: pacman
pacman::p_load(dplyr,
               tidycovid19)

# Download latest data
updates <- download_merged_data(cached = TRUE)
#> Downloading cached version of merged data...
#> done. Timestamp is 2020-05-10 08:06:23
#> 
#> Data Info:
#> This data frame contains Covid-19 related data from multiple sources in
#> a country-day structure. Data sources are JHU CSSE data on confirmed
#> cases, deaths and recoveries
#> (https://github.com/CSSEGISandData/COVID-19), 'Our World in Data' data
#> on testing (https://ourworldindata.org/covid-testing), ACAPS data on
#> governmental measures
#> (https://www.acaps.org/covid19-government-measures-dataset), Apple's
#> Mobility Trend Reports on Apple Map usage
#> (https://www.apple.com/covid19/mobility), Google's Community Mobility
#> Reports on individual movement trends
#> (https://www.google.com/covid19/mobility/), Google Trends data on
#> relative Google search volumes for the term 'coronavirus'
#> (https://trends.google.com/) and country-level World Bank data on
#> population (density), life expectancy and national income
#> (https://data.worldbank.org). The data frame
#> 'tidycovid19_variable_definitions' holds definitions for each variable
#> in this data frame. The data frame 'tidycovid19_data_sources' contains
#> more information on the data sources included in this package. The
#> column 'timestamp' reports the time the data was downloaded from its
#> authoritative source.
#> 
#> For further information refer to:
#> https://github.com/joachim-gassen/tidycovid19.
#> 
# Countries to highlight
Taiwan <- "TWN"

print(plot_covid19_spread(updates,
                          highlight = Taiwan,
                          type = "confirmed",
                          cumulative = TRUE,
                          min_cases = 1,
                          min_by_ctry_obs = 1,
                          edate_cutoff = 50,
                          per_capita = FALSE,
                          log_scale = FALSE,
                          exclude_others = TRUE))
#> Warning in plot_covid19_spread(updates, highlight = Taiwan, type =
#> "confirmed", : Non-NULL 'highlight' value but no countries matched in data (Did
#> you specify correct ISO3c codes or do values for 'min_cases', 'min_by_ctry_obs'
#> and/or 'edate_cutoff' lead to the exclusion of your selected countries' data?)

Created on 2020-05-11 by the reprex package (v0.3.0)

Could this be related to the fact that the first confirmed datapoint for Taiwan is a NA?

> filter(updates, iso3c == "TWN")
# A tibble: 110 x 35
   iso3c country date       confirmed deaths recovered ecdc_cases ecdc_deaths total_tests tests_units soc_dist mov_rest
   <chr> <chr>   <date>         <dbl>  <dbl>     <dbl>      <dbl>       <dbl>       <dbl> <chr>          <dbl>    <dbl>
 1 TWN   Taiwan  2020-01-21        NA     NA        NA          1           0          NA NA                NA       NA
 2 TWN   Taiwan  2020-01-22         1      0         0          1           0          NA NA                NA       NA
 3 TWN   Taiwan  2020-01-23         1      0         0          1           0          NA NA                NA       NA
 4 TWN   Taiwan  2020-01-24         3      0         0          1           0          NA NA                NA       NA
 5 TWN   Taiwan  2020-01-25         3      0         0          3           0          NA NA                NA       NA
 6 TWN   Taiwan  2020-01-26         4      0         0          3           0          NA NA                NA       NA
 7 TWN   Taiwan  2020-01-27         5      0         0          5           0          NA NA                NA       NA
 8 TWN   Taiwan  2020-01-28         8      0         0          7           0          NA NA                NA       NA
 9 TWN   Taiwan  2020-01-29         8      0         0          8           0          NA NA                NA       NA
10 TWN   Taiwan  2020-01-30         9      0         0          8           0          NA NA                NA       NA
# … with 100 more rows, and 23 more variables: pub_health <dbl>, gov_soc_econ <dbl>, lockdown <dbl>,
#   apple_mtr_driving <dbl>, apple_mtr_walking <dbl>, apple_mtr_transit <dbl>, gcmr_retail_recreation <dbl>,
#   gcmr_grocery_pharmacy <dbl>, gcmr_parks <dbl>, gcmr_transit_stations <dbl>, gcmr_workplaces <dbl>,
#   gcmr_residential <dbl>, gtrends_score <dbl>, gtrends_country_score <int>, region <chr>, income <chr>,
#   population <dbl>, land_area_skm <dbl>, pop_density <dbl>, pop_largest_city <dbl>, life_expectancy <dbl>,
#   gdp_capita <dbl>, timestamp <dttm>
joachim-gassen commented 4 years ago

Good catch! That was a side effect of Taiwan (being a jurisdiction, not a "country" as defined by the World Bank) not having population data. I checked for population by default (even if population_cutoff was set to zero). I changed this and added a message if the options require population data to be present. So Taiwan should plot now for your code and will be excluded if and only if you set parameters that require population data to be present.

See whether it works for you.