Case study data - Githubissues

seabbs commented 2 years ago

Suggestion from @sbfnk to use this Ebola example: https://www.pnas.org/doi/full/10.1073/pnas.1518587113#supplementary-materials

seabbs commented 2 years ago

(sorry for slight radio silence - will catch back up elsewhere shortly).

I have added this data and wrangled it into the right format for our pipeline (except for model fitting). I've also flagged some points in time to evaluate as a test (see below). This outbreak is interesting due to the delayed take-off. I think potentially you were right and we do want to compare time-varying delay distributions. May need a little thought on how we want to dio this if we want to.

We may also want to compare the untruncated distribution you would estimate once that cohort is fully observed vs the truncated data we see at the time. I am not sure if we would want to do this for all models as that does leave what to do about censoring up in the air.

cases dist120 dist160

parksw3 commented 2 years ago

(sorry for slight radio silence - will catch back up elsewhere shortly).

No worries at all. I've also been getting a bit busy with other work.

I think potentially you were right and we do want to compare time-varying delay distributions. May need a little thought on how we want to dio this if we want to.

Something like mean~s(ptime) (though this is an extreme simplification). I think this is where using discrete time variables can be useful too. Not sure how we would do time-varying delays + censoring without making our heads explode (and is it worth the effort?).

We may also want to compare the untruncated distribution you would estimate once that cohort is fully observed vs the truncated data we see at the time. I am not sure if we would want to do this for all models as that does leave what to do about censoring up in the air.

Good point. I've also done some comparisons of methods across untruncated distributions here https://github.com/parksw3/dynamicaltruncation/issues/27.

seabbs commented 2 years ago

We can do time-varying delays with censoring without any extra effort can't we? though we might have to assume that the delays themselves are discrete but the observations aren't? If we want to relax that then yes it becomes pretty insane and likely not worth doing.

I am just adding the retrospective counterfactual for each estimation time. I'd suggest we just compare to this (making it different from the simulations that baseline on the ground truth) and leave time-varying delays for another time?

parksw3 commented 2 years ago

We can do time-varying delays with censoring without any extra effort can't we?

I'm not sure. Ideally, we want to allow the mean to vary across ptime, which is a censored variable. If we make an assumption that the mean delay doesn't change within a day, we can do it. Otherwise, putting something smooth is probably a very good approximation, but technically not correct. It becomes more of a problem if the censoring window becomes wider.

seabbs commented 2 years ago

Yes agree.

Here though shall we just stick with the simple fixed case and compare to the cohort distribution (i.e the fully observed one)? It means we are leaving something on the table but perhaps that should be the subject of more work (i.e time-varying and strata varying delays perhaps?)

seabbs commented 1 year ago

This has been added in so closing. Opening a separate issue for time-varying delays (i.e we discuss them).

parksw3 / epidist-paper

Case study data #9