ina-foss / inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
MIT License
753 stars 128 forks source link

Model training #5

Closed pankaj2701 closed 6 years ago

pankaj2701 commented 6 years ago

How do we train a model to be used for the inaSpeechSegmenter

DavidDoukhan commented 6 years ago

Hi, inaSpeechSegmenter is provided with pre-trained models. You can have a look to the following papers in order to train your own models:

https://www.researchgate.net/profile/David_Doukhan/publication/324752429_AN_OPEN-SOURCE_SPEAKER_GENDER_DETECTION_FRAMEWORK_FOR_MONITORING_GENDER_EQUALITY/links/5ae0713d458515c60f64ef61/AN-OPEN-SOURCE-SPEAKER-GENDER-DETECTION-FRAMEWORK-FOR-MONITORING-GENDER-EQUALITY.pdf

https://hal.archives-ouvertes.fr/hal-01514228/document

However, using alternative models may require to set different parameters to viterbi postprocessing.

Regards,

Cloud299 commented 6 years ago

Great model, worked great for my work on speech-music segmentation. If I have two arbitrary classes, say dog and cat sounds, is there a quick way to use the API to train a model using this inaSpeechSegmenter (using my own data) ?