Closed XiSHEN0220 closed 2 years ago
Hi, Siyao~ Thanks for releasing and cleaning the code!!
May I ask why in the pre-processing part, the audio (music) features are extracted twice and with different sampling rates?
Precisely, in _prepro_aistpp.py, the audio features are extracted with the sampling rate 15360*2
While in _prepro_aistpp_music.py, the audio features are extracted with the sampling rate 15360*2/8
Sorry for replying late.
The initial idea is to fit the frame rate of both the dancing sequence (153602 --> 60fps) and the downsampled features (153602/8 -->7.5fps). In motion gpt training we need to feed 7.5fps music feature to align with the motion code (downsampled 8 times from original dance) while in reinforcement learning, we need to count in original dance (60fps). So we made the feature twice.
Hope this help.
Thanks ~
@XiSHEN0220 @lisiyao21 Could you please tell me why use 15360 * 2 as the sampling rate for 60 FPS, I mean, how this specific rate is obtained through calculation?
153602
I have set the sample rate as 15360*2, and use you provided function "extract_acoustic_feature", but the fps of extracted is not 60, as follow, can you give me some suggestions?
Hi, Siyao~ Thanks for releasing and cleaning the code!!
May I ask why in the pre-processing part, the audio (music) features are extracted twice and with different sampling rates?
Precisely, in _prepro_aistpp.py, the audio features are extracted with the sampling rate 15360*2
While in _prepro_aistpp_music.py, the audio features are extracted with the sampling rate 15360*2/8