lilywang1988 / eSIR

Extended state-space SIR epidemiological models
Creative Commons Attribution 4.0 International
73 stars 36 forks source link

Facing issues while running prediction #17

Closed sridharan1302 closed 4 years ago

sridharan1302 commented 4 years ago

Hi Lilywang1988, Thankyou for your work.It is an excellent prediction framework for COVID.I am currently running predictions for various countries as part of a simulation .While running sometimes i am getting Error in if (sum(thetaI_mat[l, first_tp_vec[l]:T_fin] <= eps) > 0) { : missing value where TRUE/FALSE needed

I am running at M=5e3 and nburnin=2e2 and nadapt=10000. Sometimes i dont get this issue when i am using different TFin for prediction .Can you tell me why this happens.

Thankyou in Advance.

lilywang1988 commented 4 years ago

Hi Sidharan1302,

I changed the if() a little bit so that it can tolerate any missingness in the matrix. I am not sure whether that is the case. Please try again using the new version (0.3.2). If it still does not work, please send me the codes you used and the data (if possible).

Best, Lili

sridharan1302 commented 4 years ago

Hi Lilywang1988, Thank you for the prompt reply..I will try the same.I am using jhu data for making predictions for NY in this case. But it fails for older version giving the above error .I will install 0.3.2 and try it. Regards, Sridharan

sridharan1302 commented 4 years ago

Hi Lilywang1988, Thank you for the versioning. It worked without the mentioned error. However i have a couple of queries.All using JHU data

(Settings M-5e3,nburnin-2e3 and nadapt-1e4-Populations at original from wiki and data from JHU)

Please guide me in this regards. Thank you for the help.

With regards, Sridharan1302

lilywang1988 commented 4 years ago

Hi sridharan1302,

May I ask how you define the accuracy? Our next version, which I am testing now, is on the way. The newer version will be faster and output the Gelman-Rubin statistic to check the convergence. Therefore, you may be able to track the convergence of MCMC and possibly reduce the length of the chains. The newer version will be up before this Friday.

Best, Lili

sridharan1302 commented 4 years ago

Hi Lilywang,

               We define the accuracy by basically comparing MAPE from the original infections .We are converting the prevalence back into infection count and comparing them to the day to day counts (cumulative).

The growth observed on day to day basis is either very high or very low.If there are some ways to adjust the day to day growth into the equations somehow (atleast after it has come down to 5% or something would be great in predicting ) What happens is that for first few days (after last input date) its underpredicting and then it starts overpredicting after a particular number of days (Basically day on day growth observed by the model is more) I will wait for the Friday version. Thank you for your patience and guidance.

With regards, Sridharan V

lilywang1988 commented 4 years ago

Hi Sridharan,

I have updated the package to version 0.3.3. You can now check the convergence of the MCMC chains using the Gelman-Rubin statistic. This can help you check the quality of posterior draws. Moreover, I would like to remind you that eSIR is more like an intervention forecast tool. The real prediction can be affected by some other factors that are not estimated from the model. For example, the pi(t) function is not estimated from the data, which can affect your prediction a lot. You may try different pi(t) functions and find the optimal one. There are some other methods to estimate the time-dependent function. For example, this one: https://www.medrxiv.org/content/10.1101/2020.02.17.20024257v1.full.pdf. We are not sure which solution would be the best, but pi(t) for t before your last observation date would affect the estimation, and thus the prediction. Let me know if you have any other questions.

Best, Lili

sridharan1302 commented 4 years ago

Hi Lili,

      Great input.Thank you for the update in package.I will check the alternate paper suggested.The Gelman-Rubin statistic would also help a lot.Its true that determining the pi(t) is the hardest part.We are assuming it to be 0 to 1 based on the severity of the measure taken. Once again thanks for taking time to answer the questions   with patience.

With regards, Sridharan V

sridharan1302 commented 4 years ago

Hi Lili,

Please note that the Gelman Diag list is not getting created .Instead i am getting a notebook created with multiple error of the following kind: <simpleError in gelman.diag(as.mcmc.list(jags_sample[[name]])): could not find function "gelman.diag">

One more question i have is that even when i give DIC(Deviance Info Criterion) as TRUE its not giving that value in the output file(Stepsummary file).Is this normal or is there something that needs to be changed?

With regards, Sridharan V

lilywang1988 commented 4 years ago

Hi Sridharan,

I forgot to import the package coda. I apologize. Now it should work. Please let me know if it does not. Thank you for checking it for us!

Best, Lili

sridharan1302 commented 4 years ago

Hi Lili, Thanks for the update. I just ran the updated version.I am able to get the gelman diag list without any problem but even at very low setting(M=5e2,nburnin=2e2,nadapt=1e4) they have R value below 1.1 for all the variables.Is it then fine that i can use these settings itself?? Also i saw the vSIR paper with varying state.Is there a github repository for these authors?This seems like good approach to try.As you suggested the last pi value is having huge influence on the predictions.

With regards, Sridharan V

lilywang1988 commented 4 years ago

Hi Sridharan,

You are right. I got questions from the reviewer on this too. I am not very familiar with this statistic, so I looked up the papers and checked the R function gelman.diag() from coda. I don't think I found any obvious mistake in the function. But I notice that when those close to 1, e.g. 1.03, the function will trim it to 1 instead of putting the exact value. Still, 1.03 is a small value. I personally feel that longer chains gave me more stable/convergent results in my sensitivity analyses. So I don't have a concrete conclusion now. I will correct it if I see some mistakes in the function.

As for the vSIR, you can contact them for the codes. I don't know whether they have a github repository. If not, I have some codes from them. I don't know whether I can distribute their codes without their permission. But let me know if you fail to hear from them.

Best, Lili

lilywang1988 commented 4 years ago

Hi Sridharan,

I also notice that if the chains are too short, the G-R stat cannot be successfully computed (giving an error) especially for the prevalence parameters. R0 seems to be easy to converge, but the variance controllers are less convergent.

BTW, just to remind that the package also allows users to export MCMC chains so people can check convergence using their own ways directly.

Best, Lili

sridharan1302 commented 4 years ago

Hi Lili, Thank you for the detailed response.I will see how to work out bigger chains.I too prefer longer chains as thats how its recommended in many papers.(Only problem is the execution time).. I will also try to contact the authors for the repo access. As for prevalence param they also dont seem to vary much for the few countries i have run.Let me see for more countries.

With regards, Sridharan V

lilywang1988 commented 4 years ago

Hi Sridharan,

I talked with someone who is way more experienced in Bayes (than I) and learned that it was possible that the parameters were trapped at some local values. We have so many parameters in the model including all the daily prevalence values. The Gelman-Rubin statistic is doing its job but possibly cannot distinguish this situation from the true convergence. I think still people should run longer chains if they can. Sensitivity analyses might help a lot to check convergence too.

Best, Lili

sridharan1302 commented 4 years ago

Hi Lili,

Thank you for your Guidance Lili.I will take that into account and run the chains longer. I have a question which is out of scope and unrelated to this model. But is there a method to estimate second wave occurrence of COVID19 or number of waves that may occur in a particular country

With regards, Sridharan

lilywang1988 commented 4 years ago

Hi Sridharan,

I have done that by increasing pi(t) in the tvt.eSIR function after a peak has been observed.

Best, Lili

sridharan1302 commented 4 years ago

Hi Lili, Thank you for the help.I will test this for further waves.