Test option for applications

diego-fustes commented 6 years ago

The current applications allow to train, validate and apply a model, but a testing mode is missing. This mode would apply a given model to the test dataset in the given protocol and then output the same metrics used for validation.

hbredin commented 6 years ago

This would be a nice addition to the library, indeed. Here is how I would implement it, if I were to do it myself.

For pyannote-speech-detection and pyannote-change-detection, we should

update the "validate" mode so that it creates a params.yml file (similar to the one created by pyannote-pipeline) containing the best epoch and the corresponding best detection threshold.
add a "test" mode that reads this params.yml file, extract raw scores (in the same way "apply" mode does -- but without storing them on disk), apply the best threshold, compute and report metrics, and (optional) dump the resulting segmentation to disk (as a text file, in the same way pyannote-pipeline "apply" mode does)
(optional) use pyannote-metrics command line tool to obtain a more thorough performance report

This would look like this:

pyannote-speech-detection test [options] [--dump=<output_dir>] <params.yml> <database.task.protocol>

For pyannote-speaker-embedding, we could do the same for speaker verification protocols, but I am not sure it would make much sense for speaker diarization protocols since we assume perfect segmentation in "validate" mode. Even for the former, it would mean that we agree that the best way of aggregating embeddings is taking their average (which is probably not true).

I suggest we (you?) start with pyannote-speech-detection so that we can see how it goes. What do you think?

diego-fustes commented 6 years ago

I can tackle the implementation of your approach, which looks reasonable. I'm starting with pyannote-change-detection as suggested. I'll post the link to the pull request here once done

diego-fustes commented 6 years ago

Here comes my first implementation: https://github.com/pyannote/pyannote-audio/pull/135. Please read the comments that I've put there.

hbredin commented 5 years ago

I have been working on the next release of pyannote.audio recently and this is the last issue I want to address before going through.

I believe we no longer need this test mode now that pyannote.pipeline has been released, with its own apply mode that also computes and displays the evaluation metric.

Furthermore, adding the proposed test mode would probably give the false impression that the returned value is the best one can achieve with the model. This is actually not true because it relies on a sub-optimal pipeline where only some of the hyper-parameters are tuned.

For instance, for speech activity detection, I'd rather have people use validate mode to select the best epoch on the development set and then tune a fully-fledged pipeline with pyannote-pipeline.

Therefore, I am going to close this issue and the corresponding pull request. I am very sorry that you worked on that and it ends up not being used. Thank you again for your contribution to the project and I hope you'll contribute again in the future!.

diego-fustes commented 5 years ago

Hi, that's ok, I understand that the next release has 'eaten' this feature. I've got knowledge in the process, which is the most important thing :)

pyannote / pyannote-audio

Test option for applications #132