Closed neozhangthe1 closed 6 years ago
Can you please provide me with a simple self-contained script that I could run to reproduce the error?
Below is a simple example. Both hypothesis and hypothesis1 achieve a 100% purity. If we set the whole audio file as a single Segment, we will get a 100% for both purity and coverage.
from pyannote.core import Annotation, Timeline, Segment
from pyannote.metrics.segmentation import SegmentationPurity, SegmentationCoverage
hypothesis = Timeline(segments=[Segment(0, 10)])
hypothesis1 = Timeline(segments=[Segment(0, 3), Segment(3, 4), Segment(4, 5)])
reference = Annotation()
reference[Segment(1, 2)] = "a"
reference[Segment(3, 5)] = "b"
reference[Segment(7, 8)] = "a"
purity = SegmentationPurity()(reference, hypothesis)
coverage = SegmentationCoverage()(reference, hypothesis)
purity1 = SegmentationPurity()(reference, hypothesis1)
coverage1 = SegmentationCoverage()(reference, hypothesis1)
print(purity, coverage)
print(purity1, coverage1)
This the expected behavior but I agree that the documentation is not clear.
SegmentationPurity
and SegmentationCoverage
assume that the supports of reference and hypothesis are the same.
If not, it silently extrudes the hypothesis so that its support matches the one of the reference.
This is indeed bad design -- it should probably raise an error instead... Is this something you would like to help contribute? I'd love to merge a pull request on the develop
branch :)
What you are looking for is DiarizationPurity
and DiarizationCoverage
.
(Note that it also starts by focusing on the intersection of reference and hypothesis supports)
>>> from pyannote.core import Annotation, Segment
>>> from pyannote.metrics.diarization import DiarizationPurity, DiarizationCoverage
>>> purity = DiarizationPurity()
>>> coverage = DiarizationCoverage()
>>> reference = Annotation()
>>> reference[Segment(1, 2)] = "a"
>>> reference[Segment(3, 5)] = "b"
>>> reference[Segment(7, 8)] = "a"
>>> hypothesis = Annotation()
>>> hypothesis[Segment(0, 10)] = "A"
>>> purity(reference, hypothesis)
# 0.5
>>> coverage(reference, hypothesis)
# 1.0
Thanks for the quick response, but I still a little confusing. I'm currently doing a Speaker Change detection task. Say if I get a prediction with no change detected, the resulting segmentation purity and segmentation coverage will both be 100%. Is this expected?
SegmentationPurity
and SegmentationCoverage
can be applied to full partitions of the file, not just speech regions:
>>> reference = Annotation()
>>> reference[Segment(0, 1)] = 'non_speech'
>>> reference[Segment(1, 2)] = 'a'
>>> reference[Segment(2, 4)] = 'b'
>>> reference[Segment(4, 5)] = 'non_speech'
>>> reference[Segment(5, 10)] = 'a'
>>> hypothesis = Annotation()
>>> hypothesis[Segment(0, 10)] = 'A'
>>> SegmentationPurity()(reference, hypothesis)
# 0.5
>>> SegmentationCoverage()(reference, hypothesis)
# 1.0
For speaker change detection, if you only want to evaluate speech regions, DiarizationPurity
and DiarizationCoverage
are the way to go: just make sure each segment in the hypothesis has its own label. Does it make sense?
I'm experimenting with the speaker change detection with the implementation in pyannote-audio https://github.com/pyannote/pyannote-audio/blob/master/pyannote/audio/applications/change_detection.py#L216
alphas = np.linspace(0, 1, 20)
purity = [SegmentationPurity(parallel=False) for alpha in alphas]
coverage = [SegmentationCoverage(parallel=False) for alpha in alphas]
# -- SAVE RESULTS --
for i, alpha in enumerate(alphas):
# initialize peak detection algorithm
peak = Peak(alpha=alpha, min_duration=min_duration)
for uri, reference in groundtruth.items():
# apply peak detection
hypothesis = peak.apply(predictions[uri])
# compute purity and coverage
purity[i](reference, hypothesis)
coverage[i](reference, hypothesis)
the hypothesis generated by peak.apply(predictions[uri]) is a Timeline object. the segmentation purity is fixed at 1.0 regardless of the value of alpha.
This is where I got confused
Oh I think I got the point. I need to fill the gaps with non_speech label. Great project and thanks for your help!
Description
Segmentation Purity always outputs 1.0
Example
hypothesis = <Timeline(uri=None, segments=[<Segment(-0.0125, 4.52)>, <Segment(4.52, 8.66)>, <Segment(8.66, 15.63)>, <Segment(15.63, 17.22)>, <Segment(17.22, 19.2)>, <Segment(19.2, 25.51)>, <Segment(25.51, 36.39)>, <Segment(36.39, 38.12)>, <Seg ment(38.12, 39.59)>, <Segment(39.59, 46.23)>, <Segment(46.23, 54.05)>, <Segment(54.05, 69.84)>, <Segment(69.84, 71.03)>, <Segment(71.03, 91.51)>, <Segment(91.51, 93.81)>, <Segment(93.81, 101.95)>, <Segment(101.95, 103.75)>, <Segment(103.75 , 105.96)>, <Segment(105.96, 115.82)>, <Segment(115.82, 128.33)>, <Segment(128.33, 166.473)>])>
the timeline of reference = <Timeline(uri=None, segments=[<Segment(3.11, 3.97)>, <Segment(4.61, 8.02)>, <Segment(8.71, 15.57)>, <Segment(17.25, 17.95)>, <Segment(19.21, 20.11)>, <Segment(20.12, 20.71)>, <Segment(20.72, 25.46)>, <Segment(26.86, 27.46)>, <Segment(27.47, 29.86)>, <Segment(29.87, 31.66)>, <Segment(31.67, 32.56)>, <Segment(32.57, 33.46)>, <Segment(33.47, 34.06)>, <Segment(36.35, 37.53)>, <Segment(38.17, 39.53)>, <Segment(42.72, 44.81)>, <Segment(44.82, 46.15)>, <Segment(46.85, 47.75)>, <Segment(47.76, 49.85)>, <Segment(49.86, 52.82)>, <Segment(54.08, 54.79)>, <Segment(55.44, 56.04)>, <Segment(56.05, 57.84)>, <Segment(57.85, 59.27)>, <Segment(59.92, 64.42)>, <Segment(64.43, 67.12)>, <Segment(67.13, 69.75)>, <Segment(71.07, 71.97)>, <Segment(71.98, 72.57)>, <Segment(72.58, 73.47)>, <Segment(73.48, 76.77)>, <Segment(76.78, 84.66)>, <Segment(85.31, 86.51)>, <Segment(86.52, 88.61)>, <Segment(88.62, 90.71)>, <Segment(90.72, 91.43)>, <Segment(93.93, 100.53)>, <Segment(100.54, 101.13)>, <Segment(101.14, 101.87)>, <Segment(103.78, 105.89)>, <Segment(106.53, 107.43)>, <Segment(107.44, 107.85)>, <Segment(108.49, 109.99)>, <Segment(110, 111.19)>, <Segment(111.8, 113.59)>, <Segment(113.6, 114.7)>, <Segment(115.9, 128.27)>, <Segment(128.91, 133.41)>, <Segment(133.42, 135.15)>, <Segment(135.79, 139.09)>, <Segment(139.21, 141.79)>, <Segment(141.8, 145.99)>, <Segment(146, 146.89)>, <Segment(146.9, 147.79)>, <Segment(148.4, 149.89)>, <Segment(149.9, 151.39)>, <Segment(151.4, 152.29)>, <Segment(152.3, 156.19)>, <Segment(156.2, 156.79)>, <Segment(156.8, 159.49)>, <Segment(159.5, 160.29)>, <Segment(160.94, 162.74)>, <Segment(162.75, 164.49)>])>
after self._partition(self, timeline, coverage) the hypothesis becomes <Timeline(uri=None, segments=[<Segment(3.11, 3.97)>, <Segment(4.61, 8.02)>, <Segment(8.71, 15.57)>, <Segment(17.25, 17.95)>, <Segment(19.21, 20.11)>, <Segment(20.12, 20.71)>, <Segment(20.72, 25.46)>, <Segment(26.86, 27.46)>, <Segment(27.47, 29.86)>, <Segment(29.87, 31.66)>, <Segment(31.67, 32.56)>, <Segment(32.57, 33.46)>, <Segment(33.47, 34.06)>, <Segment(36.35, 37.53)>, <Segment(38.17, 39.53)>, <Segment(42.72, 44.81)>, <Segment(44.82, 46.15)>, <Segment(46.85, 47.75)>, <Segment(47.76, 49.85)>, <Segment(49.86, 52.82)>, <Segment(54.08, 54.79)>, <Segment(55.44, 56.04)>, <Segment(56.05, 57.84)>, <Segment(57.85, 59.27)>, <Segment(59.92, 64.42)>, <Segment(64.43, 67.12)>, <Segment(67.13, 69.75)>, <Segment(71.07, 71.97)>, <Segment(71.98, 72.57)>, <Segment(72.58, 73.47)>, <Segment(73.48, 76.77)>, <Segment(76.78, 84.66)>, <Segment(85.31, 86.51)>, <Segment(86.52, 88.61)>, <Segment(88.62, 90.71)>, <Segment(90.72, 91.43)>, <Segment(93.93, 100.53)>, <Segment(100.54, 101.13)>, <Segment(101.14, 101.87)>, <Segment(103.78, 105.89)>, <Segment(106.53, 107.43)>, <Segment(107.44, 107.85)>, <Segment(108.49, 109.99)>, <Segment(110, 111.19)>, <Segment(111.8, 113.59)>, <Segment(113.6, 114.7)>, <Segment(115.9, 128.27)>, <Segment(128.91, 133.41)>, <Segment(133.42, 135.15)>, <Segment(135.79, 139.09)>, <Segment(139.21, 141.79)>, <Segment(141.8, 145.99)>, <Segment(146, 146.89)>, <Segment(146.9, 147.79)>, <Segment(148.4, 149.89)>, <Segment(149.9, 151.39)>, <Segment(151.4, 152.29)>, <Segment(152.3, 156.19)>, <Segment(156.2, 156.79)>, <Segment(156.8, 159.49)>, <Segment(159.5, 160.29)>, <Segment(160.94, 162.74)>, <Segment(162.75, 164.49)>])>
I don't think the purity of this example is 1.0 since <Segment(128.33, 166.473)>])> contains two different speakers.