umautobots / bidirection-trajectory-predicter

The code for Bi-directional Trajectory Prediction (BiTraP).
Other
78 stars 23 forks source link

Question - KL Annealing #4

Closed ksachdeva closed 2 years ago

ksachdeva commented 3 years ago

Hi @MoonBlvd

Typically I have seen that in VAE implementations one slowly ramp up the "beta" for KL loss. It is observed that it is not a good idea to include KL at early stages so you start from beta=0 and then reach to beta=1 after some training steps. beta=1 will mean that you are including full KL loss.

In your implementation, I see different behavior. You use a scheduler for KL weight that increases that weight from 0 to higher value after every batch step ..... up to the max value of 100. Shouldn't it increase up to 1 instead of 100?

Would appreciate it if you could educate on the reasoning behind it and/or if there is a paper out there that talks about using such kind of annealing scheme.

Regards & thanks Kapil

MoonBlvd commented 3 years ago

Hi @ksachdeva for the KL annealing, I followed trajectron++'s code to make the compaison with them fair. I think beta is just a weighting parameter and there is no such restrictions saying it cannot go beyong 1. The reason to have it as large as 100 is to enforce more closer prior and recognition networks because the KL loss is very small and may not have enough influence on the total loss.