parksw3 / epidist-paper

Other
10 stars 4 forks source link

Debugging fitting methods based on exponential simulation results #13

Closed parksw3 closed 1 year ago

parksw3 commented 1 year ago

figure

parksw3 commented 1 year ago
parksw3 commented 1 year ago

Found one problem. Previously, we had delay_daily=floor(delay) but this is different from what we actually observe, stime_daily-ptime_daily.

For example, if ptime=1.9 and stime=2, then true delay is 0.1 but the daily delay is 1. Instead, floor(delay) = 0. This might allow some of the discrete methods to work better.

parksw3 commented 1 year ago

stime_daily-ptime_daily is breaking something now... need to fix... figured it out...!

Previously, we had

truncated_linelist <- linelist |>
    data.table::copy() |>
    # Update observation time by when we are looking
    DT(, obs_at := obs_time) |>
    DT(, obs_time := obs_time - ptime) |>
    # I've assumed truncation in the middle of the censoring window.
    # For discussion.
    DT(, censored_obs_time := obs_time - (ptime_daily + 0.5)) |>
    DT(, censored := "interval") |>
    DT(stime <= obs_time)

But obs_time in defining censored_obs_time was getting obs_time from obs_time - ptime rather than the function argument. So I changed it to DT(, censored_obs_time := obs_at - (ptime_daily + 0.5)). Running again now.

parksw3 commented 1 year ago

Changing delay=stime_daily-ptime_daily and DT(, censored_obs_time := obs_at - (ptime_daily + 0.5)) fixed something because we're getting good estimates during the decay phase, when there should be barely any truncation. But it looks like other methods aren't working (all giving same estimates?). Two conclusiosn:

parksw3 commented 1 year ago

Fixed it. DT(stime <= obs_time) was problematic for the same reason as above. Changing to DT(stime <= obs_at) fixed it.

parksw3 commented 1 year ago

figure

parksw3 commented 1 year ago

figure

New results with new parameters.

parksw3 commented 1 year ago

figure

Ran a new simulation with larger sample size. Also changing the labels on the xaxis and putting parameter names as a title.

seabbs commented 1 year ago

Nice work. This is all looking good.

Compare sdlog estimates for naive vs censoring for the stable scenario. Censoring helps get better sdlog estimates. This might explain why naive truncation giving bad estimates for sdlog.

I think this is plausible though on the face of it I would have also expected a bias in the meanlog.

The latent model looks like it slightly underestimates the sdlog - plausibly the issue with a uniform prior showing up?

parksw3 commented 1 year ago

The latent model looks like it slightly underestimates the sdlog - plausibly the issue with a uniform prior showing up?

I was wondering about this as well. I think the problem is that we can't tell from this one particular simulation. Is this just a bad sample (95% should contain truth 95% of the time so are we just in the 1/20 case)? Or is the method actually doing something wrong?

Either

seabbs commented 1 year ago

I've run the full pipeline on more samples etc here: https://github.com/parksw3/dynamicaltruncation/pull/20

Has some issues with the filtering models as can end up with no samples and therefore errors.

seabbs commented 1 year ago

As I said elsewhere we might want to do some more formal simulation-based calibration to get a handle on this but ideally if we can use what we are doing anyway that seems like a good idea as adds quite a lot of overhead.

parksw3 commented 1 year ago

Going to close this for now. Don't think we need to look at these again.