parksw3 / epidist-paper

Other
10 stars 4 forks source link

Correlation between estimated stime and ptime #17

Closed parksw3 closed 1 year ago

parksw3 commented 1 year ago

See scripts/test_inftime_exp.R. I started thinking about whether uniform distribution is a reasonable prior.

test_inftime_exp

Red points are true values. Black points are estimated. Obviously, we don't have any other information to constrain ptime or stime so it makes sense that we get the prior back (basically). But we're actually not getting the priors back. You might notice that the estimated ptime is slightly biased towards left and the estimated stime is slightly biased towards right.

Rplot

More importantly, we find a negative correlation between stime and ptime (why???). Note that the true ptime and stime should be uncorrelated:

Rplot02

This also means that the estimated stime-ptime is typically >0.

Rplot01

~~I simulated these with r=0.2, meanlog = 1.8, sdlog = 0.5 and am getting meanlog = 1.85 (1.75--1.98) and sdlog = 0.45 (0.39--0.52). Slight bias in the estimate of meanlog caused by the correlation. I'm guessing the bias in sdlog is also caused by the same issue. ~~ Actually, I have no idea what I simulated these with. I tried running this again and now not getting bias in sdlog. Different seed maybe?? meanlog is still biased though.

More to learn..

parksw3 commented 1 year ago

Tried an exponential simulation without truncation. Getting very strong correlations:

Rplot04

Do we need to think about reparameterizing? I don't know if it will improve estimates... I also tried reparameterizing in terms of ptime and delay but still giving strong negative correlations.

seabbs commented 1 year ago

Interesting. First plot is very joy division esk. I think we might need to use some simulation-based calibration (i.e simulating multiple multiple samples from the prior and checking coverage) to get a more robust grip on this.

The correlation doesn't seem ideal but I guess makes sense given we are trying to fit a fixed distribution to a population each of which has a latent parameter. If we keep that distribution fixed and change ptime then the easiest change overall is to change stime?

seabbs commented 1 year ago

I think parameterising could make sense (though ideally we want to keep this method as it is widely used in the literature...). The reparameterisation you have tried seems like the obvious one but I think doesn't change the shape of the posterior enough to prevent the negative correlation from happening (well obviously given it didn't work).

A potential solution would be to sample from the uniform priors and then fit the model for each sample as a truncated but continuous model? That will be very computationally expensive and not ideal.

I think the better solution to suggest is to provide more other information on when am event is likely to occur (i.e by having a transmission process to inform the prior). I'm not sure we should attempt to solve that here vs just pointing it out.

seabbs commented 1 year ago

Have you explored what happens in a zero-growth setting where the uniform prior is correct?

But we're actually not getting the priors back. You might notice that the estimated ptime is slightly biased towards left and the estimated stime is slightly biased towards right.

For your first figure how does this match up with your simulated data? Is the bias in the direction of the bias in the data or unrelated?

parksw3 commented 1 year ago

I think we might need to use some simulation-based calibration (i.e simulating multiple multiple samples from the prior and checking coverage) to get a more robust grip on this.

Agreed.

A potential solution would be to sample from the uniform priors and then fit the model for each sample as a truncated but continuous model? That will be very computationally expensive and not ideal.

Definitely not ideal and computationally expensive. Also, posterior samples for each sample would be associated with different posterior distribution probabilities, which we need to account for. And the current method is already doing that. So maybe this correlation is intrinsic to the problem.

I think the better solution to suggest is to provide more other information on when am event is likely to occur (i.e by having a transmission process to inform the prior). I'm not sure we should attempt to solve that here vs just pointing it out.

This is possible but difficult. I don't think we should be too worried about not being able to estimate each event time accurately. As long as we're doing OK on average, we should be OK.

For your first figure how does this match up with your simulated data?

Doesn't match up as far as I remember. But more simulations coming soon.

Have you explored what happens in a zero-growth setting where the uniform prior is correct?

I think the uniform prior might not be actually correct in this setting. More coming soon.

seabbs commented 1 year ago

I just updated and reran this to get the following:

test_inftime_exp plot plot (1)

So it seems like recent updates have reduced by not entirely mitigated this. Oddly there now appears to be "banding" in the correlation plot.

seabbs commented 1 year ago

Where are we with this? I think this is at the add as a discussion piece stag?

parksw3 commented 1 year ago

This is partly covered in https://github.com/parksw3/dynamicaltruncation/issues/21 and https://github.com/parksw3/dynamicaltruncation/issues/27.

Otherwise, it doesn't seem like there's a way to get rid of this. Definitely should be discussed in the paper. But could also be included in the main multi-panel figure explaining issues with censoring.

seabbs commented 1 year ago

This has been implemented into the paper as a figure so closing.