Open ramanshah opened 4 years ago
<5
data into this exactly?Reviewing what I've done since creating this issue, I have an informative Beta prior that helps the results not be so quite so ridiculous. I'd need to thread this back through Tableau Public and pull a fresh screenshot that includes all towns.
The idea to use a Poisson likelihood (either with a Gamma prior or some non-conjugate prior with numerical solution) remains sensible, even after reading much of McElreath. It remains attractive to be able to use the <5
symbol directly. A key test case in the sort is New Shoreham/Block Island, which is tiny and shows <5
. I'd guess that in such a tiny population, the number of deaths was probably 1. Imputing the symbol to 2.5, as we're now doing, is overstating the overdose problem on Block Island.
Data is now much better, with a tidy table including year-over-year history vs scraping a single year from PDFs:
Pushed to branch use_history
.
Using this to the fullest would involve a Poisson model and an assumption of slow time variation on the event rate parameter. Possibly even a constant event rate parameter or some treatment of over-dispersion?
The posterior changes dramatically from
0
deaths to<5
deaths, particularly for the tiny towns that are most likely to have these death counts. When there are zero deaths, the posterior looks insanely different than any of the others. I cut off the smallest towns to make the dashboard for the README, but avoiding such ad-hocery is exactly why one goes Bayesian in the first place. The current dashboard snapshot is incomplete and unsatisfying.Research the canonical way to judge the quality of these intervals (likely through cross validation and coverage testing). Use this work to do some more model development for the interval construction. I may have to let go of the simplistic Beta paradigm to do a good job of dealing with the data suppression. For example, a Poisson process, unlike the Beta, would give the likelihood of a
<5
directly.Consider borrowing strength among years (these are in the dashboards but I'd have to change the scraping/ETL) or among cities (such as with an Empirical Bayes prior specification) to improve the intervals.
The goal should be that going up from
0
to<5
doesn't "violently" change the posterior interval.