pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.75k stars 2.02k forks source link

ENH Normalizing flows #1438

Closed taku-y closed 6 years ago

taku-y commented 8 years ago

I implemented normalizing flows in ADVI and tested it on a beta-binomial model (notebook). Though there are other attractive approaches for improving approximation of ADVI, NFs was the most straightfoward to implement.

I'm looking foward to any comments, especially on the design of the interface, how to write tests, and good examples to demonstrate the effectiveness of NFs.

To run the example notebook, you need to clone branch advi_nf in my repo.

twiecki commented 8 years ago

Can't wait to check it out!

springcoil commented 8 years ago

Me neither :)

On Tue, Oct 11, 2016 at 12:37 PM, Thomas Wiecki notifications@github.com wrote:

Can't wait to check it out!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1438#issuecomment-252893443, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8DiP4O26c2u4gP548VvxwLPmaEIwxmks5qy3TcgaJpZM4KTgTN .

Peadar Coyle Skype: springcoilarch www.twitter.com/springcoil peadarcoyle.wordpress.com

springcoil commented 8 years ago

Hi Taku, I see nothing wrong with the design of the interface nor the tests.

I also see nothing wrong with the actual code etc - I think it's a good approach. I'll need to think a bit more about testing this though :)

twiecki commented 8 years ago

@taku-y I still haven't totally grokked your NB so excuse the naive question: which parts of this would be in pymc3 and which parts need to be written by the user?

taku-y commented 8 years ago

Hi Peter, Thank you for checking.

I think the beta-binomial example is enough for testing NFs on global RVs, but I have no idea for testing NFs on local RVs (used in autoencoding VB). I need to think more on that.

taku-y commented 8 years ago

@twiecki Sorry for not enough description in the notebook. The user need to write the code in "ADVI with IAF". I think it would be better to add modules implementing IAF and MADE to PyMC3.

twiecki commented 8 years ago

How general are the choices made in "ADVI with IAF"? Could I reuse the first few lines for e.g. the stochastic volatility model (or a different one if that one is too hard)? I suppose another example model would be useful.

springcoil commented 8 years ago

I understand it as very general as a method. This is simply a way to help ADVI get over some of the deficiencies it has - it's in the Google Deep Mind paper https://arxiv.org/pdf/1505.05770v6.pdf am I correct @taku-y?

taku-y commented 8 years ago

@twiecki IAF is said to be arbitrary flexible in the original paper. IAF is an affine transformation, but its cofficients can be arbitrary functions wrt the input RVs.

Specifically, if z is 2-dimensional RV, IAF has the form

f1(z1) = (z1 + a1) / b1 f2(z2) = (z2 + a2(z1)) / b2(z1)

or

f1(z1) = (z1 + a1(z2)) / b1(z2) f2(z2) = (z2 + a2) / b2,

Because of the dependency of the two variables in this transformation, the log determinant of the Jacobian becomes just the sum of the log of the scaling parameters: - log(b1) - log(b2). Actually, the Jacobian becomes lower or upper triangular. This property is useful for models with high-dimensional RV.

Could you tell me the notebook of the stochastic volatility model? I will try to apply IAF to that, because I want to learn financial data analysis. I think it is straightforward.

twiecki commented 8 years ago

https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/stochastic_volatility.ipynb

taku-y commented 8 years ago

@springcoil Exactly. That paper proposes normalizing flows first time. Inverse autoregressive flow (IAF) is a type of NFs where its Jacobian determinant is easily computed.

taku-y commented 8 years ago

@twiecki Thanks!

springcoil commented 8 years ago

@taku-y That makes sense. I must admit I never read the original paper. It makes sense mathematically that the IAF would work. Great work :)

taku-y commented 8 years ago

I tried to run the stochastic volatility notebook, but during training with IAF ELBO became Inf. One of the reasons might be too much complexity of the neural net (masked autoencoder, MADE) used in IAF; for n=400 timepoints, MADE has (n^2 - n) / 2 parameters. To take into account the temporal structure of the data, we need a more simple transformation. This problem is more difficult than I thought.

twiecki commented 8 years ago

So the use-case is models with lots of data (can use mini-batches) and complex posteriors (can't do ADVI)?

taku-y commented 8 years ago

I think so. In the current case, we might need to carefully construct the transform function.

taku-y commented 8 years ago

Small learning rate (0.001) supressed nan and the NB finished. But I don't think IAF was effective in this case.

twiecki commented 8 years ago

scaling=model.dict_to_array(sds) should be scaling=model.dict_to_array(sds)**2.

Shouldn't you compare the ADVI posterior to the NUTS posterior?

twiecki commented 8 years ago

But yeah, this model might be a bit too involved as the posterior correlations here are insanely high.

twiecki commented 8 years ago

Maybe the ANN example is a better use-case?

taku-y commented 8 years ago

I agree, ANN is a good for demonstration. And I want to try convolutional VAE.

springcoil commented 8 years ago

Hi Taku, I haven't forgotten about this, I've got some notes written on paper that I'd like to write up, as part of the explanation of the normalising flows. I'll try to do it tomorrow... however I might have not much time.

springcoil commented 8 years ago

https://gist.github.com/springcoil/ba9acd345153393af694326433f6c636 is an early version of the notes, this is a work in progress but gives some more insight into the normalising flows. Could you have a look @taku-y and @AustinRochford.

taku-y commented 8 years ago

Hi Peader, Thanks for your note! It would be nice an explanation of the method. I will look this when I get off work.

AustinRochford commented 8 years ago

@springcoil this is great. As someone who is not that familiar with masked autoencoders, I think that a short discussion with relevant links would be very helpful.

springcoil commented 8 years ago

I'll add that in @AustinRochford I think it needs a bit more work but it's a start.

taku-y commented 8 years ago

I added a description of AEVB. It's not directly relevant to normalizing flows, I think it would be nice to clarify for which PyMC3 computes. https://gist.github.com/taku-y/43550e688f4020ac7da15a181d262f2f

springcoil commented 8 years ago

Cool. I will add that in

On 27 Oct 2016 2:22 AM, "Austin Rochford" notifications@github.com wrote:

@springcoil https://github.com/springcoil this is great. As someone who is not that familiar with masked autoencoders, I think that a short discussion with relevant links would be very helpful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1438#issuecomment-256521888, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8DiNVB86fd4yc1ntv05UPLefqR-qtPks5q3_y7gaJpZM4KTgTN .

springcoil commented 8 years ago

I wrote up another example - https://gist.github.com/springcoil/b4e3728406ff8ee5f9a0ccd21b97b2d9 but not much change. I'll need to think a bit more about this.

springcoil commented 8 years ago

I was in with the Deep Mind guys (Danilo Reznede) and he asked Shouldn't you compare the ADVI posterior to the NUTS posterior? this as well. I think we should compare them, does anyone have any code easy to access on this?

taku-y commented 8 years ago

I think the beta-bernoulli example showsthe difference and a normalizing flow can compensate it. But it would be nice to compare them in other probabilistic models.

springcoil commented 8 years ago

@taku-y are you ready to submit a PR with the classes in it with your advi_nf. We can then add in something based on the notebooks to the docs. Or is this too premature.

taku-y commented 8 years ago

I will send a PR, though there are something to do, e.g., writing tests and explanation of API. Especially, I'm wondering what kind of normalizing flows should be added.

springcoil commented 8 years ago

I've found this http://shakirm.com/papers/VITutorial.pdf which is a tutorial on Variational Inference. Some notes from that might be good for other parts of the docs - like the stuff on variational inference (ADVI) etc.

On normalizing flows @taku-y how about we add in an ANN based or an alternative to MADE for another normalizing flow?

taku-y commented 8 years ago

Thanks for your information. This slide is helpful for me to write a documentation. BTW, I uploaded the doc in the gist notebook. Please check it if you have time.

Normalizing flows requires for transformations that the computation of the Jacobian determinant of is low and the class of such transformations is very limited. General ANN doen't met the condition. I can't come up with other than MADE and ones proposed in the original paper.

taku-y commented 8 years ago

I want to postpone implementing this feature after #1287, because the implementation of KL-reweighting requires restructure the code.

twiecki commented 8 years ago

Sounds good.

springcoil commented 8 years ago

Sounds good to me. However I'd recommend this goes to the 3.1 branch

springcoil commented 7 years ago

I've been reading through this for my notes on IAF - https://arxiv.org/pdf/1606.04934v1.pdf -- have you considered RNNs or LSTMs @taku-y ?

springcoil commented 7 years ago

Hi @taku-y I updated the notebook with a bit more about IAF. https://gist.github.com/springcoil/4fda94fcde0934b04fc34967e0c952de I hope this helps.

twiecki commented 7 years ago

@springcoil That's a great description.

springcoil commented 7 years ago

Thanks

taku-y commented 7 years ago

Sorry for late response. "for non-geniuses" is a good summary of NF and it would be nice to place it in an early part of the document. Thanks a lot.

jereliu commented 6 years ago

Hi! Just wondering if there's any follow-up in adding IAF as one of the FlowFn in the current NFVI framework?

ferrine commented 6 years ago

Hi, not planning yet

пн, 26 мар. 2018 г., 0:03 Jeremiah Zhe Liu notifications@github.com:

Hi! Just wondering if there's any follow-up in adding IAF as one of the FlowFn in the current NFVI framework?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1438#issuecomment-376003246, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKb7lYF23QwhjOByC4SPWfxPEVpwKBkks5tiAYsgaJpZM4KTgTN .

junpenglao commented 6 years ago

We should discuss IAF in a new threat. Closing this one.