tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.23k stars 1.09k forks source link

How to build a bayesion hierarchical/multilevel modeling with tf-keras? #1826

Closed yqNLP closed 1 month ago

yqNLP commented 1 month ago

I'm new to tfp and recently I read a lot about bayesion hierarchical/multilevel modeling from tfp tutorial. However, I don't know how to build a bayesion hierarchical/multilevel model with tf-keras. Could anyone who's familiar with tfp & keras provide a code snippet example. Thanks a lot!

csuter commented 1 month ago

There are some examples on the main TFP site demonstrating hierarchical and linear mixed effects models. These examples don't use keras, and I don't think it would be essential to do so, but if you study and understand these examples and separately study the keras API you may be able to incorporate it successfully (eg for some sort of non-linear modeling).

I'll close this issue; feel free to ping with questions or open another issue for further inquiry.

yqNLP commented 1 month ago

well, the reason why I try to use keras is I know how to predict with a buillt tf probability model, use

yhat = model(x_tst)

from https://www.tensorflow.org/probability/examples/Probabilistic_Layers_Regression#case_3_epistemic_uncertainty

However, as the https://www.tensorflow.org/probability/examples/Linear_Mixed_Effects_Model_Variational_Inference, we build a tf probability model with a surrogate posterior and trained it with VI. I feel so confused abount how to predict with such a model. Could u give me more suggestions on this stupid question? @csuter

csuter commented 1 month ago

Not a stupid question! These things are non obvious..

When we call model(x) as in that example, I believe a sample is being drawn from the surrogate posterior over model weights, and the predicted distribution is just the noise model on top of the linear (er, affine) map that the pointwise sample induces. This is kind of a hack imo, for the purpose of making that demo succinct and flashy (I helped write it so I can say this 😂). What we really want as bayesians is to do some averaging over the weights drawn from the posterior -- the posterior predictive distribution. If you just do what the vi demo does, then sample weights from your trained surrogate posterior and push new data through the linear map, you should end up in the same place. But in either the keras version or the tfp vi version, if you draw a bunch of posterior samples and average, then you're really doing the bayesian thing. Hope this helps.

yqNLP commented 1 month ago

Not a stupid question! These things are non obvious..

When we call model(x) as in that example, I believe a sample is being drawn from the surrogate posterior over model weights, and the predicted distribution is just the noise model on top of the linear (er, affine) map that the pointwise sample induces. This is kind of a hack imo, for the purpose of making that demo succinct and flashy (I helped write it so I can say this 😂). What we really want as bayesians is to do some averaging over the weights drawn from the posterior -- the posterior predictive distribution. If you just do what the vi demo does, then sample weights from your trained surrogate posterior and push new data through the linear map, you should end up in the same place. But in either the keras version or the tfp vi version, if you draw a bunch of posterior samples and average, then you're really doing the bayesian thing. Hope this helps.

I would really appreciate it if tfp teams could share some tutorials about the prediction and serving of tfp model in the future work.

csuter commented 1 month ago

Search for posterior predictive in the existing tutorials. In a bayesian context this is what amounts to prediction. Serving models is very problem/context dependent, and outside the scope of tfp itself.