Open mucunzhuzhu opened 2 months ago
Thanks for your interests in Lodge. In Table2 of the Lodge main paper, we follow the Bailando setting to test our method on the AIST++ dataset. Therefore, the results of Bailando and FACT are from the origin paper of Bailando. As for EDGE and Lodge, we train the network at 30 FPS and generate dance in 30fps, then upsample to 60fps. The upsample method is a simple motion interpolate technology that you can ask ChatGPT to write it. In order to fairly compare with the other method on the AIST++ dataset, we strictly followed Bailando's experimental setup and cut the generated dances into 20s for testing (Details in the Bailando paper).
Thanks for the explanation. But I found that all the music in the test set on the AIST++ dataset is less than 20s, may I ask how did you generate the 20s dances?
I generate dance sequences longer than 20 seconds because the music length in the test set of AIST++ is more than 20 seconds. Are you following the Bailando experimental setting carefully?
@li-ronghui I found that there are only ten music in the test set of the AIST++ dataset because EDGE doesn't need seed motion. So did you generate 4 motion sequences for each music when evaluating EDGE? However, I can't get the same metric as described in your paper through this way. So Could you further describe how you evaluate EDGE?
I generate dance sequences longer than 20 seconds because the music length in the test set of AIST++ is more than 20 seconds. Are you following the Bailando experimental setting carefully?
Hi @li-ronghui @mucunzhuzhu , I'm also curious about this question.
I checked all the videos of the test data (downloaded via aistplusplus_api), and I found that they are all less than 15 seconds. Then I noticed that Bailando download music pieces from "AIST" but "AIST++" as described in their data preparation. And the downloaded music are all longer than 20 seconds (29 seconds to 1 minutes). Maybe that would be a little bit confusing.
I found that the metrics for baselines are different in different articles. I am curious about the experimental setup for Lodge, EDGE, Bailando and FACT in Table 2. Because as far as I know, the FPS in the Bailando and FACT articles is 60, but in the EDGE and lodge, the FPS is 30. In your paper, you have noted that you interpolated the output dances to 60 fps, but it's hard for me to understand the details. What's more, what's the dance length for testing? May I ask how you compare these articles fairly?