vitutorial / VITutorial

This repository stores slides for a tutorial on variational inference for NLP audiences.
298 stars 55 forks source link

M0 #22

Closed philschulz closed 6 years ago

philschulz commented 6 years ago

Thanks for doing this. It's looking really good! Easing people into the topic is definitely a good idea. Please create a PR next time. I will now refer to slide numbers. (There seems to be problem with the slide numbers btw. -> the total is always 1).

slide 3: I don't agree with you definition of unsupervised learning. Learning a distribution over data sounds exactly like supervised learning to me. Shouldn't it be "learn a joint distribution over observed and unobserved data?" Also the examples are not entirely clear to me. You can learn sentences, images etc. fully supervised and conversely you can do unsupervised parsing.

slide 4: I would put another pause before the second itemize

slide 5: good one!

slide 9: too much. Split this slide up in two.

slide 10: I'm struggling with the last bullet point. In what respect is stochastic optimisation more general purpose than backprop?

slide 15: Really nice way of kicking off the tutorial!

wilkeraziz commented 6 years ago

Slide 3: my thinking was "learn a joint distribution" induces a marginal over observed data, but I see your point. I see your points about the examples too. Do you have any ideas how I could clear this up (preferably in a single slide)?

Slide 4: gotcha

Slide 9: okay, I will address that

Slide 10: stochastic optimisation exists without backprop. Suggestions? Basically I meant to convey DL sits on stochastic optimisation powered by backprop and other than the fact that the loss happens to be likelihood there's nothing too probabilistic about it.

Sorry about the PR ;)

philschulz commented 6 years ago

I'm gonna hop onto the plane any minute so this will be my last reply until we meet.

Slide 3: just say joint distribution. The marginal is a byproduct that we don't need to mention explicitly, methinks. (Caveat: the marginal is of course what we later use to motivate VI. However, it's too early to talk about it at this point.) As for examples, I would actually make density estimation for sentences, images etc. supervised and then have sentences+latent trees, images+latent labels etc. on the unsupervised side.

Slide 10: maybe say "stochastic opt + backprop enable modern deep learning". Basically any explanation that does not seem to claim greater generality will make me happy.