Closed diego-fustes closed 5 years ago
This would be a nice addition to the library, indeed. Here is how I would implement it, if I were to do it myself.
For pyannote-speech-detection
and pyannote-change-detection
, we should
params.yml
file (similar to the one created by pyannote-pipeline
) containing the best epoch and the corresponding best detection threshold.params.yml
file, extract raw scores (in the same way "apply" mode does -- but without storing them on disk), apply the best threshold, compute and report metrics, and (optional) dump the resulting segmentation to disk (as a text file, in the same way pyannote-pipeline
"apply" mode does)pyannote-metrics
command line tool to obtain a more thorough performance reportThis would look like this:
pyannote-speech-detection test [options] [--dump=<output_dir>] <params.yml> <database.task.protocol>
For pyannote-speaker-embedding
, we could do the same for speaker verification protocols, but I am not sure it would make much sense for speaker diarization protocols since we assume perfect segmentation in "validate" mode. Even for the former, it would mean that we agree that the best way of aggregating embeddings is taking their average (which is probably not true).
I suggest we (you?) start with pyannote-speech-detection
so that we can see how it goes.
What do you think?
I can tackle the implementation of your approach, which looks reasonable. I'm starting with pyannote-change-detection
as suggested. I'll post the link to the pull request here once done
Here comes my first implementation: https://github.com/pyannote/pyannote-audio/pull/135. Please read the comments that I've put there.
I have been working on the next release of pyannote.audio
recently and this is the last issue I want to address before going through.
I believe we no longer need this test
mode now that pyannote.pipeline
has been released, with its own apply
mode that also computes and displays the evaluation metric.
Furthermore, adding the proposed test
mode would probably give the false impression that the returned value is the best one can achieve with the model. This is actually not true because it relies on a sub-optimal pipeline where only some of the hyper-parameters are tuned.
For instance, for speech activity detection, I'd rather have people use validate
mode to select the best epoch on the development set and then tune a fully-fledged pipeline with pyannote-pipeline
.
Therefore, I am going to close this issue and the corresponding pull request. I am very sorry that you worked on that and it ends up not being used. Thank you again for your contribution to the project and I hope you'll contribute again in the future!.
Hi, that's ok, I understand that the next release has 'eaten' this feature. I've got knowledge in the process, which is the most important thing :)
The current applications allow to train, validate and apply a model, but a testing mode is missing. This mode would apply a given model to the test dataset in the given protocol and then output the same metrics used for validation.