Closed philschulz closed 6 years ago
Slide 3: my thinking was "learn a joint distribution" induces a marginal over observed data, but I see your point. I see your points about the examples too. Do you have any ideas how I could clear this up (preferably in a single slide)?
Slide 4: gotcha
Slide 9: okay, I will address that
Slide 10: stochastic optimisation exists without backprop. Suggestions? Basically I meant to convey DL sits on stochastic optimisation powered by backprop and other than the fact that the loss happens to be likelihood there's nothing too probabilistic about it.
Sorry about the PR ;)
I'm gonna hop onto the plane any minute so this will be my last reply until we meet.
Slide 3: just say joint distribution. The marginal is a byproduct that we don't need to mention explicitly, methinks. (Caveat: the marginal is of course what we later use to motivate VI. However, it's too early to talk about it at this point.) As for examples, I would actually make density estimation for sentences, images etc. supervised and then have sentences+latent trees, images+latent labels etc. on the unsupervised side.
Slide 10: maybe say "stochastic opt + backprop enable modern deep learning". Basically any explanation that does not seem to claim greater generality will make me happy.
Thanks for doing this. It's looking really good! Easing people into the topic is definitely a good idea. Please create a PR next time. I will now refer to slide numbers. (There seems to be problem with the slide numbers btw. -> the total is always 1).
slide 3: I don't agree with you definition of unsupervised learning. Learning a distribution over data sounds exactly like supervised learning to me. Shouldn't it be "learn a joint distribution over observed and unobserved data?" Also the examples are not entirely clear to me. You can learn sentences, images etc. fully supervised and conversely you can do unsupervised parsing.
slide 4: I would put another pause before the second itemize
slide 5: good one!
slide 9: too much. Split this slide up in two.
slide 10: I'm struggling with the last bullet point. In what respect is stochastic optimisation more general purpose than backprop?
slide 15: Really nice way of kicking off the tutorial!