About the experimental setup in Table 2

li-ronghui / LODGE

The code the CVPR2024 paper Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

98 stars 6 forks source link

About the experimental setup in Table 2 #17

Open mucunzhuzhu opened 2 months ago

mucunzhuzhu commented 2 months ago

I found that the metrics for baselines are different in different articles. I am curious about the experimental setup for Lodge, EDGE, Bailando and FACT in Table 2. Because as far as I know, the FPS in the Bailando and FACT articles is 60, but in the EDGE and lodge, the FPS is 30. In your paper, you have noted that you interpolated the output dances to 60 fps, but it's hard for me to understand the details. What's more, what's the dance length for testing? May I ask how you compare these articles fairly?

li-ronghui commented 2 months ago

Thanks for your interests in Lodge. In Table2 of the Lodge main paper, we follow the Bailando setting to test our method on the AIST++ dataset. Therefore, the results of Bailando and FACT are from the origin paper of Bailando. As for EDGE and Lodge, we train the network at 30 FPS and generate dance in 30fps, then upsample to 60fps. The upsample method is a simple motion interpolate technology that you can ask ChatGPT to write it. In order to fairly compare with the other method on the AIST++ dataset, we strictly followed Bailando's experimental setup and cut the generated dances into 20s for testing (Details in the Bailando paper).

mucunzhuzhu commented 2 months ago

Thanks for the explanation. But I found that all the music in the test set on the AIST++ dataset is less than 20s, may I ask how did you generate the 20s dances?

li-ronghui commented 2 months ago

I generate dance sequences longer than 20 seconds because the music length in the test set of AIST++ is more than 20 seconds. Are you following the Bailando experimental setting carefully?

jjd1123 commented 2 months ago

@li-ronghui I found that there are only ten music in the test set of the AIST++ dataset because EDGE doesn't need seed motion. So did you generate 4 motion sequences for each music when evaluating EDGE? However, I can't get the same metric as described in your paper through this way. So Could you further describe how you evaluate EDGE?

MingCongSu commented 2 months ago

I generate dance sequences longer than 20 seconds because the music length in the test set of AIST++ is more than 20 seconds. Are you following the Bailando experimental setting carefully?

Hi @li-ronghui @mucunzhuzhu , I'm also curious about this question.

I checked all the videos of the test data (downloaded via aistplusplus_api), and I found that they are all less than 15 seconds. Then I noticed that Bailando download music pieces from "AIST" but "AIST++" as described in their data preparation. And the downloaded music are all longer than 20 seconds (29 seconds to 1 minutes). Maybe that would be a little bit confusing.