microsoft / datamations

https://microsoft.github.io/datamations/
Other
66 stars 14 forks source link

Create a datamation for covid efficacy #97

Open jhofman opened 2 years ago

jhofman commented 2 years ago

@dggoldst caught this tweet, which shows a possible case of Simpson's paradox among vaccinated people in Israel. the news media is starting to pick up on this as well.

it'd be really nice to datamate this if we can find a way. it's a bit different than the salary example because it's binary outcomes, so could challenge us in a nice way, too.

sharlagelfand commented 2 years ago

Oooh thanks @jhofman @dggoldst! I'll take a look and see how we might be able to visualize this - definitely good to think about binary outcomes and how we can show those.

jhofman commented 2 years ago

Btw, this was super hard for @dggoldst and I to understand. Reading this post helped:

https://web.archive.org/web/20211109015334/https://www.covid-datascience.com/post/israeli-data-how-can-efficacy-vs-severe-disease-be-strong-when-60-of-hospitalized-are-vaccinated

jhofman commented 2 years ago

We looked at this and realized that it's extra challenging because the base rates are low (a handful out of 100,000 people have severe cases), so we created #98 as a simpler case---we'll look at Simpson's Paradox in batting averages to brainstorm binary outcomes, then we'll circle back to this.

jhofman commented 2 years ago

Adding some dangling code I had for creating dataframes to use for playing around with a COVID efficacy visualization:

library(tidyverse)

theme_set(theme_bw())

ppl_per_dot <- 10000

counts <- tribble(
  ~age, ~vax_status, ~num_ppl, ~num_severe_cases,
  "Below 50", "Not vaccinated", 116834, 43,
  "Below 50", "Fully vaccinated", 3501118, 11,
  "Above 50", "Not vaccinated", 186078, 171,
  "Above 50", "Fully vaccinated", 2133516, 290
) %>%
  mutate(num_ok = num_ppl - num_severe_cases)

individual_rows <- counts %>%
  pivot_longer(c(num_severe_cases, num_ok), "variable", "value") %>%
  mutate(num_points = as.integer(ceiling(value / ppl_per_dot))) %>%
  rowwise() %>%
  do(data.frame(age = .$age, vax_status = .$vax_status, outcome = rep(gsub('num_', '', .$variable), times = .$num_points))) %>%
  ungroup()
jhofman commented 2 years ago

Also, two related articles that could be interesting to datamate.

First viz: https://messaging-custom-newsletters.nytimes.com/template/oakv2?campaign_id=9&emc=edit_nn_20211012&instance_id=42622&nl=the-morning&productCode%3DNN=&regi_id=151685017&segment_id=71397&te=1&uri=nyt%3A%2F%2Fnewsletter%2F3920aed4-2adc-5e7a-8901-0271ca47d214&user_id=3cfe556826e1420e5572984864db2469

Paragraphs starting with "Agency research has estimated": https://www.nytimes.com/2021/10/06/health/covid-vaccine-children-dose.html?campaign_id=9&emc=edit_nn_20211012&instance_id=42622&nl=the-morning&regi_id=151685017&segment_id=71397&te=1&user_id=3cfe556826e1420e5572984864db2469

(Let me know if you hit paywall problems.)

jhofman commented 2 years ago

Relevant icon array!

https://twitter.com/xruiztru/status/1452180847088517131

image

sharlagelfand commented 2 years ago

@jhofman can you link some of the other articles we were looking at in our meeting today? especially the ones looking by age. thank you!

jhofman commented 2 years ago

yes!

the first chart here could be interesting to visualize as an icon array / animate: https://messaging-custom-newsletters.nytimes.com/template/oakv2?campaign_id=9&emc=edit_nn_20211012&instance_id=42622&nl=the-morning&productCode%3DNN=&regi_id=151685017&segment_id=71397&te=1&uri=nyt%3A%2F%2Fnewsletter%2F3920aed4-2adc-5e7a-8901-0271ca47d214&user_id=3cfe556826e1420e5572984864db2469

here are the myocarditis numbers from the cdc's nov 2-3 meeting notes

see also slides 3 and 4 here: https://www.cdc.gov/vaccines/acip/meetings/downloads/slides-2021-11-2-3/04-COVID-Oster-508.pdf

jhofman commented 2 years ago

More interesting efficacy numbers here: https://www.cdc.gov/mmwr/volumes/70/wr/mm7032e3.htm?s_cid=mm7032e3_w