Closed Joy-word closed 2 weeks ago
Hi,
It is a known problem with songs / very high voices / children's voices / cartoon voices.
As for music per se - we did not have music in the training data. As for children audio recordings they are much more relatively rare compared to adults.
OK,Thanks for your response.
❓ Questions and Help
I found that when using silero-vad for voice activity detection in vocal songs, it misses most of the high-pitched parts. I'm wondering if this is related to the project's training data? Is there a way to avoid this during the inference stage? Looking forward to your response.