joachim-gassen / tidycovid19

{tidycovid19}: An R Package to Download, Tidy and Visualize Covid-19 Related Data
https://joachim-gassen.github.io/tidycovid19/
Other
146 stars 44 forks source link

Add the possibility to visualize epidemic curves #19

Closed AndreaPi closed 4 years ago

AndreaPi commented 4 years ago

Hi Joachim,

this is more of a suggestion than a real issue, so feel free to close it if you think it's out of scope for the package.

A visualization which is extremely helpful to estimate when an epidemic will die out, is the epidemic curve, i.e., a barplot of daily new confirmed cases:

image

or a barplot of cases by symptoms insurgence:

image

Would it be possible to add it to the possible plots? Especially when trying to nowcast or forecast R_t, it's a real life-saver.

PS you can already do something quite similar with the following settings:

updates <- download_merged_data(cached = TRUE)
# Countries to highlight
countries <- "DEU"

print(plot_covid19_spread(updates,
                          highlight = countries,
                          type = "confirmed",
                          cumulative = FALSE,
                          min_cases = 1,
                          edate_cutoff = 1000,
                          per_capita = FALSE,
                          change_ave = 1,
                          log_scale = FALSE,
                          exclude_others = TRUE,
                          intervention = "lockdown"))

image

However, having the possibility to use a barplot rather than a lineplot would lead to a more familiar graph for epidemiologists. Since barplots quickly become cumbersome for multiple countries, rather than adding this as an option in plot_covid19_spread, I'd suggest to add a new function, maybe plot_epidemic_curve, which by default:

joachim-gassen commented 4 years ago

Hi Andrea,

apologies for the delay. These are certainly informative graphs. Instead of including a fuction to generate them with the package, I suggest to include them in the code file example_code.R so that others can use the code directly in their analyses.

I included the following code in example_code.R

# Suggestion by AndreaPi (issue #19)

library(tidyverse)
library(tidycovid19)
library(zoo)

df <- download_merged_data(cached = TRUE)

df %>%
  filter(iso3c == "ITA",
         date > "2020-01-31") %>%
  mutate(
    new_cases = confirmed - lag(confirmed),
    ave_new_cases = rollmean(new_cases, 7, na.pad=TRUE, align="right")
  ) %>%
  ggplot(aes(x = date)) + 
  geom_bar(aes(y = new_cases), stat = "identity", fill = "lightblue") +
  geom_line(aes(y = ave_new_cases), color ="red") +
  theme_minimal()

Rplot

Would that work for you?

AndreaPi commented 4 years ago

Nice! I wrote an equivalent script using tsibble and without importing the full tidyverse. It doesn't make any difference for what it concerns the final result, but it might be of interest to people who used to use the forecast package in R (Rob Hyndman now recommends using tsibble & related packages, such as feasts, fable, etc.).

Also, I changed the script so that it shows a plot even if it's run from RStudio and not from the console. Again, just a cosmetic difference. Your original script is just as good.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr,
               ggplot2,
               tsibble,
               tidycovid19)

df <- download_merged_data(cached = TRUE)

Italy <- df %>%
  filter(iso3c == "ITA",
         date > "2020-01-31") %>%
  as_tsibble(index = date) %>%
  mutate(
    new_cases = confirmed - lag(confirmed),
    ave_new_cases = slide_dbl(new_cases, mean, .size = 7, .align = "right")
  ) 

p <- ggplot(Italy, aes(x = date)) +
  geom_bar(aes(y = new_cases), stat = "identity", fill = "lightblue") +
  geom_line(aes(y = ave_new_cases), color ="red") +
  theme_minimal()

print(p)

image

Feel free to close the issue and include whichever version of the script you like best 🙂

joachim-gassen commented 4 years ago

Thank you! I am lazy and often use library(tidyverse) even if I only require a limited set of packages. My guess that my script simply relies on dplyr, ggplot2, zoo ans tidycovid19. So it is sort of a zoo versus tsibble question. As I already included my script version in the repo and zoo is a dependency of tidycovid19, I will be rolling with this one.

AndreaPi commented 4 years ago

Sounds nice! Anyway, I went to the trouble of actually writing a function for this, if someone wants reusable code to plot epidemic curves for multiple countries.

plot_epidemic_curve <- function(df, country_code, start_date, window = 7){

  epidemic_curve_data <- df %>%
    filter(iso3c == country_code,
           date > start_date) %>%
    mutate(
      new_cases = confirmed - lag(confirmed),
      ave_new_cases = rollmean(new_cases, window, na.pad=TRUE, align="right")
    ) 

  p <- ggplot(epidemic_curve_data, aes(x = date)) + 
    geom_bar(aes(y = new_cases), stat = "identity", fill = "lightblue") +
    geom_line(aes(y = ave_new_cases), color ="red") +
    theme_minimal() 

  print(p)

  return(list(epidemic_curve_data = epidemic_curve_data, plot = p))
}

# test
plot_epidemic_curve(df, "ITA", "2020-01-31")
plot_epidemic_curve(df, "SWE", "2020-01-31")

image image