Closed areiner222 closed 2 years ago
Yes you are correct. nflows actually uses the inverse of the function for sampling. This however should not really change the results. I am not sure why they went with this design choice. Maybe for a particular class of transformations computing the inverse could be slower, so they wanted to have a fast way of going from the output to latent to maximize the log-probability during training.
Understood. thanks for your help!
I've been working with a comparable tensorflow implementation of the conditional glow normalizing flow and have had some trouble with nan losses (and have not done a deep-dive yet to try and identify the issue) when I do not use the order of operations they use in nflows. Curious if you have tried inverting the order and attempted training?
The key thing to make training stable is to run a dummy forward pass so that the ActNorm layers are initialized properly as I do here. In nflows
the first time you go through them they are initialized based on the activations for the first batch.
So you might want to do something similar in your implementation.
Ah, that's really helpful.
So I'm understanding - at init you compute the log_prob of a ground truth batch which forces the normalizing flow bijector to convert from theta to z. Because of the aforementioned inverting that nflows does in terms of how it composes the component bijectors, the first operation is forward mode for act norm. Therefore the input to the forward mode of the act norm bijector is the rot6d representation of the smpl pose parameters, and, upon first run, it updates log_scale/shift parameters to enforce that the post-activation has zero mean / unit variance (as your code comment says).
Am I getting that correctly? You still allow for the scale / shift parameters to be trainable after init, correct?
Yes, what you said is correct. The parameters are trainable after the initialization. The initialization trick that ActNorm uses is probably needed to improve the convergence properties by “whitening” the activations in the intermediate layers.
Hi @nkolot ,
I really enjoyed this project - a very unique spin on probabilistic modeling for human 3d pose reconstruction, and the resulting context-based, strong prior seems to work great for your new version of SMPLify!
I also am a big fan of your conditional normalizing flows approach, and I am trying to understand clearly the order of operations from noise->sample.
In the supp material, I see that the forward mode norm flow bijector is depicted to act (for one block) in the order of z -> [ act norm -> linear layer -> conditional shift coupling ]-> pose theta.
However, when looking through your nflows fork, I am coming out somewhere different. Allow me to walk through what I'm seeing:
My understanding from the code is that it acts in the inverse way from "noise" to "theta sample" as depicted in the supp materials (i.e., order of operations for "sampling phase" vs "evaluation phase" are flipped).
Do you know if I am missing something ? I'd really appreciate your help!
Alex