idem-lab / epiwave

lowest-level functional interface for GPreff
2 stars 5 forks source link

pmf #8

Closed smwindecker closed 3 months ago

smwindecker commented 9 months ago

implement pmf instead of cdfs for distributions

@goldingn 's notes on the topic:

# notes for implementation
#' ## only deal with ecdfs as our dists.
#' ## no output options
#' ## ecmf name for distribution data type
#' ## remove option for only one input dist.
#' ## delay pmf object 2 col df. delay (int) and prob mass (sums to 1)
#' ##

lognormal_as_delay_pmf <- function(meanlog = 1, sdlog = 2, max_days = NULL) {

  if (is.null(max_days)) {
    max_days <- ceiling(qlnorm(0.99, meanlog = meanlog, sdlog = sdlog))
  }

  delay_pmf <- data.frame(
    days = seq(0, max_days)
  ) %>%
    mutate(
      upper = plnorm(days + 1,
                     meanlog = meanlog,
                     sdlog = sdlog),
      lower = plnorm(days,
                     meanlog = meanlog,
                     sdlog = sdlog),
      mass = upper - lower,
      correction = plnorm(max_days + 1,
                          meanlog = meanlog,
                          sdlog = sdlog),
      mass = mass / correction
    ) %>%
    select(
      days, mass
    )

  class(delay_pmf) <- c(class(delay_pmf), "delay_pmf")
  delay_pmf
}

plot.delay_pmf <- function(x, y, ...) {
  barplot(x$mass, width = 1, names.arg = x$days,
          xlab = "delay (days)",
          ylab = "probability")
}

# delay ~ lognormal(meanlog = 1, sdlog = 0.5)
delay <- lognormal_as_delay_pmf(1, 0.5)
delay
plot.delay_pmf(delay)
smwindecker commented 9 months ago

potentially as pmf utility package check what packages already exist

smwindecker commented 9 months ago

remembering that model wants delay dist input as long format that can be fed to data_to_matrix

AugustHao commented 8 months ago

note we might want to do another pass at naming of these things, a seropositivity curve would be used in the exact same way but it is not a pmf, so maybe we should choose a more generic term for it

smwindecker commented 8 months ago

@njtierney notes it would perhaps be better to save our class first:

class(delay_pmf) <- c("delay_pmf", class(delay_pmf))

AugustHao commented 8 months ago

class attribute for the pmf should include a note on how the pmf is begotten (ie parameteric or empirical), and what the columns mean

SeniorKate commented 7 months ago

Need to run make_massfun on NSW linelist to get new delay for symptom-notification in correct format. Kate to do once code finalized.

SeniorKate commented 7 months ago

Note - current make_massfun doesn't work if you're converting an ecdf to a pmf if the ecdf has already had the tailed clipped or been normalised because it will potentially clip off a large amount of the weight.

If we want to this into a generic ecdf- pmf function then need to have warnings or options for this

SeniorKate commented 7 months ago

Notes for August - old incubation period function sets the range of days from 0:28 - the new code with quantiles cuts this same distribution off at 0:12 days which captures 98% of the mass. Is this ok and just simpler?

SeniorKate commented 7 months ago

Not always adding up to 1 when normalise is used. Seems to be when data is cut off - weight doesn't seem to be reassigned

SeniorKate commented 7 months ago

Not always adding up to 1 when normalise is used. Seems to be when data is cut off - weight doesn't seem to be reassigned

Fixed with the following code:

if (normalise) { delay_massfun <- delay_massfun %>% dplyr::mutate( correction = sum(mass), mass = mass / correction ) }