nkolot / ProHMR

Repository for "Probabilistic Modeling for Human Mesh Recovery"
Other
258 stars 23 forks source link

Normalizing Flows Module - Order of Operations #10

Closed areiner222 closed 2 years ago

areiner222 commented 2 years ago

Hi @nkolot ,

I really enjoyed this project - a very unique spin on probabilistic modeling for human 3d pose reconstruction, and the resulting context-based, strong prior seems to work great for your new version of SMPLify!

I also am a big fan of your conditional normalizing flows approach, and I am trying to understand clearly the order of operations from noise->sample.

In the supp material, I see that the forward mode norm flow bijector is depicted to act (for one block) in the order of z -> [ act norm -> linear layer -> conditional shift coupling ]-> pose theta.

Screen Shot 2021-10-29 at 8 42 03 AM

However, when looking through your nflows fork, I am coming out somewhere different. Allow me to walk through what I'm seeing:

  1. Arrange each glow block as [norm, linear, coupling]
  2. When sampling / computing log probs you call sample_and_log_prob which calls the inverse of the glow bijector on noise generated from standard normal
  3. The inverse mode of the CompositeTransform inverts the component bijectors and reverses the order
  4. In forward mode, smpl flow uses the sample_and_log_prob method on noise

My understanding from the code is that it acts in the inverse way from "noise" to "theta sample" as depicted in the supp materials (i.e., order of operations for "sampling phase" vs "evaluation phase" are flipped).

Do you know if I am missing something ? I'd really appreciate your help!

Alex

nkolot commented 2 years ago

Yes you are correct. nflows actually uses the inverse of the function for sampling. This however should not really change the results. I am not sure why they went with this design choice. Maybe for a particular class of transformations computing the inverse could be slower, so they wanted to have a fast way of going from the output to latent to maximize the log-probability during training.

areiner222 commented 2 years ago

Understood. thanks for your help!

I've been working with a comparable tensorflow implementation of the conditional glow normalizing flow and have had some trouble with nan losses (and have not done a deep-dive yet to try and identify the issue) when I do not use the order of operations they use in nflows. Curious if you have tried inverting the order and attempted training?

nkolot commented 2 years ago

The key thing to make training stable is to run a dummy forward pass so that the ActNorm layers are initialized properly as I do here. In nflows the first time you go through them they are initialized based on the activations for the first batch. So you might want to do something similar in your implementation.

areiner222 commented 2 years ago

Ah, that's really helpful.

So I'm understanding - at init you compute the log_prob of a ground truth batch which forces the normalizing flow bijector to convert from theta to z. Because of the aforementioned inverting that nflows does in terms of how it composes the component bijectors, the first operation is forward mode for act norm. Therefore the input to the forward mode of the act norm bijector is the rot6d representation of the smpl pose parameters, and, upon first run, it updates log_scale/shift parameters to enforce that the post-activation has zero mean / unit variance (as your code comment says).

Am I getting that correctly? You still allow for the scale / shift parameters to be trainable after init, correct?

nkolot commented 2 years ago

Yes, what you said is correct. The parameters are trainable after the initialization. The initialization trick that ActNorm uses is probably needed to improve the convergence properties by “whitening” the activations in the intermediate layers.