Hello, excellent work! In the training phase (line 99 in the dataset.py file), you set the audio clip length to n_frames/24. If n_samples is 24, then the audio clip length is 1 second. However, during validation or inference, the same n_frames is 24, but the audio clip length is 2 seconds. What is the purpose of selecting different audio clip lengths during training and inference?
Hello, excellent work! In the training phase (line 99 in the dataset.py file), you set the audio clip length to n_frames/24. If n_samples is 24, then the audio clip length is 1 second. However, during validation or inference, the same n_frames is 24, but the audio clip length is 2 seconds. What is the purpose of selecting different audio clip lengths during training and inference?
Thanks!