Doubts about the Bailando model

lisiyao21 / Bailando

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Other

382 stars 59 forks source link

Deeply sorry for such a late reply. Indeed, the actor critic in this version is an 'on-policy' one. That is, it updates the policy based on the sampled trajectories of current policy weights. Since there is no supervision with the ground truth data anymore, if it goes to long, the policy will gradually 'forget' what it learns from the data and purely focus on the RL reward. The reward, however, do not include many items to maintain its motion quality (it only focuses on beat alignment). That is why it goes bad when finetuning more iterations.

Hope it clarifies.

Best

lisiyao21 / Bailando

Doubts about the Bailando model #42