CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Noise samples were obtained from MUSAN database
Speech excerpts were obtained from several sources, including internal private databases
Music excerpts were obtained from several public sources
I see here "smn" classifies the speech into Speech, Music and Noise. I wanted to know if possible, what or which dataset was used to train the model.
What are considered as noise ?