naobservatory / p2ra

4 stars 1 forks source link

Fit hierarchical model to most viruses #158

Closed dp-rice closed 1 year ago

dp-rice commented 1 year ago

This is a big PR that does a bunch of different things. It:

  1. Replaces the log–negative binomial model of read viral counts with a logit–binomial model. Resolves #62. (This also removes one of two non-identifiable overdispersion parameters.)
  2. Fits rather than specifies the variance in the "true" value of the predictor for each sample.
  3. Adds a hierarchical model of P2RA coefficients, with different fine_locations having different coefficients. (2) and (3) resolve #116 .
  4. Fixes a bug in handling viruses with no observed reads.
  5. Fits all viruses (except for hep b and c, which are missing some data). Resolves #115.
  6. Adds a generator for all the fits we want to do: every virus, every taxid, every predictor type. This is used in fitting and for tests.
  7. Adds a text file summarizing the fits, fit_summary.txt. (Planning to make this summary machine readable in a future PR)
  8. Converts the model's coefficients (which are on a logit scale) to the more interpretable "relative abundance at 1 in 1,000 {prevalence,incidence}"
  9. Converts annual incidence to weekly incidence to put it on a more similar scale to prevalence.