Questions draft (I will edit and trim my questions better soon)

prior predictive distribution vs prior distribution:

distribution of y values vs distribution of parameters before data given
posterior predictive distribution vs posterior distribution

distribution of y values vs distribution of parameters after data given. PPD has wider range of samples since it accounts for uncertainty of parameters ?
posterior predictive check compare the given data, and the new samples from PPD. are we using the data twice, once for obtaining the posterior distribution, twice for getting PPD?
Mike's variational Bayes gives us trace and ppc while MCMC giving me an error: Not enough samples to build a trace. Is it an advantage of VB? being able to get samples in some not good enough situations? what does 'Not enough samples to build a trace.' mean? The 'samples' referring to the original data set? (I doubled the sample size to produce but got the same error).
why do we call trace as trace (trivial)
sometimes the process of sampling (Variational Bayes) stopped in the middle and lagged. Or sometimes it works.... I am confused. Is it a a. converging issue, not convergence -> infinite loop? b. any other reason? c. is it common thing to handle when we do Bayesian analysis ?
Adding time variation, overdose probability p is increasing from Mike's plot while mine is not. I am wondering it's just my coding issue (maybe something I did wrong) or a difference can be made from MCMC and VB.

(link for questions problem)

tom-hc-park / MSc-RA-Bayesian-evidence-synthesis