wpinvestigative / arcos

https://wpinvestigative.github.io/arcos/
Other
30 stars 18 forks source link

Numbers from package don't match numbers in the article #8

Closed mkiang closed 4 years ago

mkiang commented 4 years ago

I just wanted to do a quick sanity check to make sure I understood what the data are. When I pull all of Kentucky for 2006 to 2012 and sum DOSAGE_UNIT, I get 1,900,570,057 but according to the associated WaPo article, I should be getting 1,901,662,933. It appears I'm about 1 million pills short. Other states also seem quite off (e.g., South Carolina).

Any insight into the discrepancy?

library(arcos)  
library(tidyverse)

ky_data <- summarized_county_annual(county = NA,
                                    state = "KY",
                                    key = "WaPo")
ky_data %>% 
    filter(year %in% 2006:2012) %>% 
    pull(DOSAGE_UNIT) %>% 
    sum()

#  1900570057
jeffcsauer commented 4 years ago

Hi @mkiang, welcome! 👋 Validated the issue in arcospy as well.

Perhaps @andrewbtran might be able to shed some light on the discrepancy?

andrewbtran commented 4 years ago

Thanks for checking in: This article was written before we'd more thoroughly looked at the data and identified at least 333 pharmacies in ARCOS that were actually hospitals or clinics or mislabeled distributors. You can see the list here. We did this by going through every pharmacy that ordered at least 1 million pills and verifying they were actually pharmacies the best we could. Since then, we made sure to filter out those dosage orders from aggregate tallies.