guyyariv / TempoTokens

This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
https://pages.cs.huji.ac.il/adiyoss-lab/TempoTokens/
MIT License
107 stars 11 forks source link

Question about the prepocessing of audio clip length #7

Open jiajiaxiaoskx opened 6 months ago

jiajiaxiaoskx commented 6 months ago

Hello, excellent work! In the training phase (line 99 in the dataset.py file), you set the audio clip length to n_frames/24. If n_samples is 24, then the audio clip length is 1 second. However, during validation or inference, the same n_frames is 24, but the audio clip length is 2 seconds. What is the purpose of selecting different audio clip lengths during training and inference?

Thanks!