I'm using pyannote for building a multi-label classifier whose task is to classify speech utterances.
The idea is to output a vector whose :
1) First dimension is for the key child (the one wearing the mic')
2) Second for the other children
3) Third for female speech
4) Fourth for male speech
5) Fifth for overlapping speech (optional)
And I'm wondering about the best way to validate my model.
I came up with 2 solutions :
1) Average (weighted or not) detection error rate across all the classes.
2) Identification error rate
Given that the task I want to solve is neither a diarization task, nor a speech activity detection task, I'd like to have your thoughts on which metric you think would be best suited.
Hi Hervé !
I'm using pyannote for building a multi-label classifier whose task is to classify speech utterances. The idea is to output a vector whose : 1) First dimension is for the key child (the one wearing the mic') 2) Second for the other children 3) Third for female speech 4) Fourth for male speech 5) Fifth for overlapping speech (optional)
And I'm wondering about the best way to validate my model.
I came up with 2 solutions :
1) Average (weighted or not) detection error rate across all the classes. 2) Identification error rate
Given that the task I want to solve is neither a diarization task, nor a speech activity detection task, I'd like to have your thoughts on which metric you think would be best suited.
Thanks !