ina-foss / inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
MIT License
736 stars 126 forks source link

VAD to detect simultaneous music and voice #48

Closed realies closed 1 year ago

realies commented 4 years ago

This is more of a feature request - is it possible to detect simultaneous music and voice?

DavidDoukhan commented 3 years ago

Right now, this would require to design new voice activity detection systems within inaspeechsegmenter. Are you aware of corpora allowing to design and evaluate such systems ?

realies commented 3 years ago

Not really. I presumed the preexisting functionality and datasets can be changed to distinguish between music and music with narration over it, based on some confidence ratio. Your comment makes it sound like to achieve this, the project needs a completely different VAD system?

realies commented 2 years ago

@DavidDoukhan, could existing corpora be used to mix music and voice with various ratios and extend the training dataset in a new VAD mode?

realies commented 1 year ago

@DavidDoukhan, is this really completed?

DavidDoukhan commented 1 year ago

This won't be done.