lisiyao21 / Bailando

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"
Other
382 stars 59 forks source link

Doubts about the Bailando model #42

Open WJ-Fifth opened 1 year ago

WJ-Fifth commented 1 year ago

You have completed a very good model!

I also achieved very good results when I was working on your model. But there are still some questions that are not very clear. Are you experiencing gradient explosion when implementing the Actor-Critic Learning module? My model still converged at the first epoch, and it did have some improvement compared to GPT. However, during the subsequent iterations, L_AC increased significantly and could not continue to converge. And the visualization results also became very strange.

Looking forward for your reply!

lisiyao21 commented 1 year ago

Deeply sorry for such a late reply. Indeed, the actor critic in this version is an 'on-policy' one. That is, it updates the policy based on the sampled trajectories of current policy weights. Since there is no supervision with the ground truth data anymore, if it goes to long, the policy will gradually 'forget' what it learns from the data and purely focus on the RL reward. The reward, however, do not include many items to maintain its motion quality (it only focuses on beat alignment). That is why it goes bad when finetuning more iterations.

Hope it clarifies.

Best