idem-lab / epiwave

lowest-level functional interface for GPreff
1 stars 4 forks source link

Min and max delay defaults for delay distributions #17

Closed SeniorKate closed 4 days ago

SeniorKate commented 4 months ago

Current implementation of our new method of making delay data - it uses quantile to cut off the tail and head of the delay at low probabilities (currently set as 0 and 0.99 proportion) and set a min and maximum delay range. However, it seems that quantile automatically sets the lowest value to 0 and the highest value to 100, such that it will cut off the last value if its set at anything <100, even its not above the specified threshold. This is a problem if you have a short, discrete delay length (which is possibly considering incubation period etc) because you could potentially lose a reasonably likely delay length. Is there a better option for working with discrete data (e.g. for our delay from data function), or should we consider implementing some kind of different default for delays with a low number of days?

Reprex:

set.seed(2024-02-15)
cdf_fun <- ecdf(rpois(1e5, 1e-2))

quantile(cdf_fun, 0.00)
0%
 0
quantile(cdf_fun, 0.99)
99%
  0
quantile(cdf_fun, 1)
100%
   2

Nick - I think it might be worth implementing some sort of heuristic for what the upper limit is, based on the maximum value in the data (quantile at 1). If it's less than some reasonable amount (20?) then switch to the 99%. That could be implemented in a little helper function, and could notify the user about the upper limit it chose (and that they can specify max_delay and min_delay if they don't like it)