pyannote / pyannote-metrics

A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
http://pyannote.github.io/pyannote-metrics
MIT License
183 stars 30 forks source link

Precision & Recall of VAD always gives 100%. #35

Closed divyeshrajpura4114 closed 4 years ago

divyeshrajpura4114 commented 4 years ago

I have used AVASpeech Dataset and tries to apply energy based voive activity detection. However using pyannote.metrics, it gives me 100% Precison and 100% Recall, which is unexpected. You can compare line no 4 and 12 where both reference and hypothesis has mismatch for nearly 2 seconds. I tried on many files and it gives always 100% result. So can anyone please help me, If I am doing anything worng??

Code :

detectionRecall = DetectionRecall()
print("Recall:",detectionRecall.compute_components(reference, hypothesis))
detectionPrecision = DetectionPrecision()
print("Precision:",detectionPrecision.compute_components(reference, hypothesis))

Below is the output of my program. Here, 0 represents Non-Speech Segmnets and 1 represents Speech Segments. Each audio is of 30s.

1. 053oq2xB3oU_21 1732
2. Ground Truth:
3. [ 00:00:00.000 -->  00:00:21.640] _ 1
4. [ 00:00:21.640 -->  00:00:24.310] _ 0
5. [ 00:00:24.310 -->  00:00:26.880] _ 1
6. [ 00:00:26.880 -->  00:00:27.640] _ 0
7. [ 00:00:27.640 -->  00:00:29.110] _ 1
8. [ 00:00:29.110 -->  00:00:30.000] _ 0
9. Hypothesis:
10. [ 00:00:00.000 -->  00:00:22.016] _ 1
11. [ 00:00:22.016 -->  00:00:22.176] _ 0
12. [ 00:00:22.176 -->  00:00:24.800] _ 1
13. [ 00:00:24.800 -->  00:00:25.200] _ 0
14. [ 00:00:25.200 -->  00:00:26.912] _ 1
15. [ 00:00:26.912 -->  00:00:27.728] _ 0
16. [ 00:00:27.728 -->  00:00:28.496] _ 1
17. [ 00:00:28.496 -->  00:00:28.688] _ 0
18. [ 00:00:28.688 -->  00:00:29.072] _ 1
19. [ 00:00:29.072 -->  00:00:29.792] _ 0
20. [ 00:00:29.792 -->  00:00:30.000] _ 1
21. Precision: {'relevant': 30.0, 'relevant retrieved': 30.0}
22. Recall: {'retrieved': 30.0, 'relevant retrieved': 30.0}
divyeshrajpura4114 commented 4 years ago

Now Its working Prefectily... I read description given in definition of DetectionPrecision function that I just need to give Speech segments as input, rather than giving both Speech and NonSpeech segments as Input.

hbredin commented 4 years ago

Glad your problem is solved.