lisiyao21 / Bailando

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"
Other
382 stars 59 forks source link

Why extracting audio features twice with different sampling rate? #14

Closed XiSHEN0220 closed 2 years ago

XiSHEN0220 commented 2 years ago

Hi, Siyao~ Thanks for releasing and cleaning the code!!

May I ask why in the pre-processing part, the audio (music) features are extracted twice and with different sampling rates?

Precisely, in _prepro_aistpp.py, the audio features are extracted with the sampling rate 15360*2

While in _prepro_aistpp_music.py, the audio features are extracted with the sampling rate 15360*2/8

lisiyao21 commented 2 years ago

Hi, Siyao~ Thanks for releasing and cleaning the code!!

May I ask why in the pre-processing part, the audio (music) features are extracted twice and with different sampling rates?

Precisely, in _prepro_aistpp.py, the audio features are extracted with the sampling rate 15360*2

While in _prepro_aistpp_music.py, the audio features are extracted with the sampling rate 15360*2/8

Sorry for replying late.

The initial idea is to fit the frame rate of both the dancing sequence (153602 --> 60fps) and the downsampled features (153602/8 -->7.5fps). In motion gpt training we need to feed 7.5fps music feature to align with the motion code (downsampled 8 times from original dance) while in reinforcement learning, we need to count in original dance (60fps). So we made the feature twice.

Hope this help.

XiSHEN0220 commented 2 years ago

Thanks ~

KevinGoodman commented 1 year ago

@XiSHEN0220 @lisiyao21 Could you please tell me why use 15360 * 2 as the sampling rate for 60 FPS, I mean, how this specific rate is obtained through calculation?

Sun-Happy-YKX commented 11 months ago

153602

I have set the sample rate as 15360*2, and use you provided function "extract_acoustic_feature", but the fps of extracted is not 60, as follow, can you give me some suggestions? image