Closed taku-y closed 6 years ago
Can't wait to check it out!
Me neither :)
On Tue, Oct 11, 2016 at 12:37 PM, Thomas Wiecki notifications@github.com wrote:
Can't wait to check it out!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1438#issuecomment-252893443, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8DiP4O26c2u4gP548VvxwLPmaEIwxmks5qy3TcgaJpZM4KTgTN .
Peadar Coyle Skype: springcoilarch www.twitter.com/springcoil peadarcoyle.wordpress.com
Hi Taku, I see nothing wrong with the design of the interface nor the tests.
I also see nothing wrong with the actual code etc - I think it's a good approach. I'll need to think a bit more about testing this though :)
@taku-y I still haven't totally grokked your NB so excuse the naive question: which parts of this would be in pymc3
and which parts need to be written by the user?
Hi Peter, Thank you for checking.
I think the beta-binomial example is enough for testing NFs on global RVs, but I have no idea for testing NFs on local RVs (used in autoencoding VB). I need to think more on that.
@twiecki Sorry for not enough description in the notebook. The user need to write the code in "ADVI with IAF". I think it would be better to add modules implementing IAF and MADE to PyMC3.
How general are the choices made in "ADVI with IAF"? Could I reuse the first few lines for e.g. the stochastic volatility model (or a different one if that one is too hard)? I suppose another example model would be useful.
I understand it as very general as a method. This is simply a way to help ADVI get over some of the deficiencies it has - it's in the Google Deep Mind paper https://arxiv.org/pdf/1505.05770v6.pdf am I correct @taku-y?
@twiecki IAF is said to be arbitrary flexible in the original paper. IAF is an affine transformation, but its cofficients can be arbitrary functions wrt the input RVs.
Specifically, if z is 2-dimensional RV, IAF has the form
f1(z1) = (z1 + a1) / b1 f2(z2) = (z2 + a2(z1)) / b2(z1)
or
f1(z1) = (z1 + a1(z2)) / b1(z2) f2(z2) = (z2 + a2) / b2,
Because of the dependency of the two variables in this transformation, the log determinant of the Jacobian becomes just the sum of the log of the scaling parameters: - log(b1) - log(b2). Actually, the Jacobian becomes lower or upper triangular. This property is useful for models with high-dimensional RV.
Could you tell me the notebook of the stochastic volatility model? I will try to apply IAF to that, because I want to learn financial data analysis. I think it is straightforward.
@springcoil Exactly. That paper proposes normalizing flows first time. Inverse autoregressive flow (IAF) is a type of NFs where its Jacobian determinant is easily computed.
@twiecki Thanks!
@taku-y That makes sense. I must admit I never read the original paper. It makes sense mathematically that the IAF would work. Great work :)
I tried to run the stochastic volatility notebook, but during training with IAF ELBO became Inf. One of the reasons might be too much complexity of the neural net (masked autoencoder, MADE) used in IAF; for n=400 timepoints, MADE has (n^2 - n) / 2 parameters. To take into account the temporal structure of the data, we need a more simple transformation. This problem is more difficult than I thought.
So the use-case is models with lots of data (can use mini-batches) and complex posteriors (can't do ADVI)?
I think so. In the current case, we might need to carefully construct the transform function.
Small learning rate (0.001) supressed nan
and the NB finished. But I don't think IAF was effective in this case.
scaling=model.dict_to_array(sds)
should be scaling=model.dict_to_array(sds)**2
.
Shouldn't you compare the ADVI posterior to the NUTS posterior?
But yeah, this model might be a bit too involved as the posterior correlations here are insanely high.
Maybe the ANN example is a better use-case?
I agree, ANN is a good for demonstration. And I want to try convolutional VAE.
Hi Taku, I haven't forgotten about this, I've got some notes written on paper that I'd like to write up, as part of the explanation of the normalising flows. I'll try to do it tomorrow... however I might have not much time.
https://gist.github.com/springcoil/ba9acd345153393af694326433f6c636 is an early version of the notes, this is a work in progress but gives some more insight into the normalising flows. Could you have a look @taku-y and @AustinRochford.
Hi Peader, Thanks for your note! It would be nice an explanation of the method. I will look this when I get off work.
@springcoil this is great. As someone who is not that familiar with masked autoencoders, I think that a short discussion with relevant links would be very helpful.
I'll add that in @AustinRochford I think it needs a bit more work but it's a start.
I added a description of AEVB. It's not directly relevant to normalizing flows, I think it would be nice to clarify for which PyMC3 computes. https://gist.github.com/taku-y/43550e688f4020ac7da15a181d262f2f
Cool. I will add that in
On 27 Oct 2016 2:22 AM, "Austin Rochford" notifications@github.com wrote:
@springcoil https://github.com/springcoil this is great. As someone who is not that familiar with masked autoencoders, I think that a short discussion with relevant links would be very helpful.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1438#issuecomment-256521888, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8DiNVB86fd4yc1ntv05UPLefqR-qtPks5q3_y7gaJpZM4KTgTN .
I wrote up another example - https://gist.github.com/springcoil/b4e3728406ff8ee5f9a0ccd21b97b2d9 but not much change. I'll need to think a bit more about this.
I was in with the Deep Mind guys (Danilo Reznede) and he asked Shouldn't you compare the ADVI posterior to the NUTS posterior?
this as well. I think we should compare them, does anyone have any code easy to access on this?
I think the beta-bernoulli example showsthe difference and a normalizing flow can compensate it. But it would be nice to compare them in other probabilistic models.
@taku-y are you ready to submit a PR with the classes in it with your advi_nf. We can then add in something based on the notebooks to the docs. Or is this too premature.
I will send a PR, though there are something to do, e.g., writing tests and explanation of API. Especially, I'm wondering what kind of normalizing flows should be added.
I've found this http://shakirm.com/papers/VITutorial.pdf which is a tutorial on Variational Inference. Some notes from that might be good for other parts of the docs - like the stuff on variational inference (ADVI) etc.
On normalizing flows @taku-y how about we add in an ANN based or an alternative to MADE for another normalizing flow?
Thanks for your information. This slide is helpful for me to write a documentation. BTW, I uploaded the doc in the gist notebook. Please check it if you have time.
Normalizing flows requires for transformations that the computation of the Jacobian determinant of is low and the class of such transformations is very limited. General ANN doen't met the condition. I can't come up with other than MADE and ones proposed in the original paper.
I want to postpone implementing this feature after #1287, because the implementation of KL-reweighting requires restructure the code.
Sounds good.
Sounds good to me. However I'd recommend this goes to the 3.1 branch
I've been reading through this for my notes on IAF - https://arxiv.org/pdf/1606.04934v1.pdf -- have you considered RNNs or LSTMs @taku-y ?
Hi @taku-y I updated the notebook with a bit more about IAF. https://gist.github.com/springcoil/4fda94fcde0934b04fc34967e0c952de I hope this helps.
@springcoil That's a great description.
Thanks
Sorry for late response. "for non-geniuses" is a good summary of NF and it would be nice to place it in an early part of the document. Thanks a lot.
Hi! Just wondering if there's any follow-up in adding IAF as one of the FlowFn in the current NFVI framework?
Hi, not planning yet
пн, 26 мар. 2018 г., 0:03 Jeremiah Zhe Liu notifications@github.com:
Hi! Just wondering if there's any follow-up in adding IAF as one of the FlowFn in the current NFVI framework?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1438#issuecomment-376003246, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKb7lYF23QwhjOByC4SPWfxPEVpwKBkks5tiAYsgaJpZM4KTgTN .
We should discuss IAF in a new threat. Closing this one.
I implemented normalizing flows in ADVI and tested it on a beta-binomial model (notebook). Though there are other attractive approaches for improving approximation of ADVI, NFs was the most straightfoward to implement.
I'm looking foward to any comments, especially on the design of the interface, how to write tests, and good examples to demonstrate the effectiveness of NFs.
To run the example notebook, you need to clone branch
advi_nf
in my repo.