pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.38k stars 784 forks source link

Trying the diarization pipeline on random .wav files #114

Closed saisumit closed 6 years ago

saisumit commented 6 years ago

Hey, as suggested by the detailed tutorials, i went through them and trained all the models required for the pipeline. The pipeline is working on the AMI dataset but when i try to reproduce the results on other .wav files sampled at 16k, mono, and 256bps, it is not able to diarize the audio. Here is the breif of what i actually did. 1) Took a random meeting audio file, sampled at 16k , mono and 256bps 2) renamed it to ES2003a and replaced it with actual ES2003a ( thought it as a turnaround of creating another database ) 3) ran all the pipelines ( sad,scd, emb, diarization )

Output : 1) Speaker activity detection works perfectly and is able to classify regions of speech. 2) Speaker diarization does't works, everything is classified as 0

can you please tell if its because of replacing the actual file that the pipeline is giving wrong outputs for the diarization, and whats a better way to test the pipeline on random audios.

bml1g12 commented 6 years ago

I'm not the developer but If you do this method I would think you will need to regenerate the Precomputed MFCCs for each step.

That being said, I suspect there is a more elegant way of using the pipeline for a new .wav without making a new database which would instead make use of the underlying python API; I've done this for embeddings but haven't used the pipeline yet.

saisumit commented 6 years ago

Actually i renamed the .wav to ES2003a that is actually the first file for which mfcc's, raw sad values and scd values are generated so you can simply break the loop after that and still get the reqd files. It would be great if you can tell me how to do it with the complete pipeline. That being said am i actually doing anything wrong as the result of diarization are simply useless ( in comparison to LIUM / AALTO diarization libraries ) which have much more robust results

hedonistrh commented 6 years ago

Hi, I have some problem about speaker change detection. It always says there is a no change for whole file. #111

I suggest that you should firstly check your training for speaker change detection. Maybe, you have same kind of problem like me.

saisumit commented 6 years ago

As i said it works perfectly on AMI dataset so i don't think that the problem is there. These are the two files i tried this on : https://drive.google.com/open?id=15Stt_JjWT7rzypHP5v9FfFTU1NmMHcew https://drive.google.com/open?id=1IP8v5_VMiQQnk426R6p-fZajyhER04Hj

hedonistrh commented 6 years ago

Hi, It is quite interesting. How you can check it works perfectly or not? Can you share result of pyannote metrics for test files like this script.


# Loop on Test Files
from pyannote.database import get_annotated
for test_file in protocol.test():
    # print (test_file)
    # load reference annotation
    reference = test_file['annotation']
    uem = get_annotated(test_file)

    # load precomputed change scores as pyannote.core.SlidingWindowFeature
    scd_scores = precomputed(test_file)

    # binarize scores to obtain speech regions as pyannote.core.Timeline
    hypothesis = peak.apply(scd_scores, dimension=1)

    # evaluate speech activity detection
    metric(reference, hypothesis.to_annotation(), uem=uem)
    purity, coverage, fmeasure = metric.compute_metrics()
    print(f'Purity = {100*purity:.1f}% / Coverage = {100*coverage:.1f}%')

purity, coverage, fmeasure = metric.compute_metrics()
print(f'Purity = {100*purity:.1f}% / Coverage = {100*coverage:.1f}%')

I ask this because you said this.

Speaker diarization does't works, everything is classified as 0

If you got %100 coverage result from the script, it can be reason of your problem.

saisumit commented 6 years ago

Hey, i tried out you suggetion, seems like i am getting 1% coverage. Any idea what can be done ?

from pyannote.database import get_protocol protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset') from pyannote.audio.features import Precomputed precomputed = Precomputed('/media/DataDriveA/Datasets/sumit/scd') from pyannote.audio.signal import Peak peak = Peak(alpha=0.5, min_duration=1.0, log_scale=True) from pyannote.metrics.diarization import DiarizationPurityCoverageFMeasure metric = DiarizationPurityCoverageFMeasure() from pyannote.database import get_annotated for test_file in protocol.test(): ... reference = test_file['annotation'] ... uem = get_annotated(test_file) ... scd_scores = precomputed(test_file) ... hypothesis = peak.apply(scd_scores, dimension=1) ... metric(reference, hypothesis.to_annotation(), uem=uem) ... 0.019149014550107722 0.014072230507041606 0.03635586201714968 0.018868535259038234 0.016567144528221323 0.018798456401816547 0.03245998534873909 0.01930586206686937 0.018597586570416588 0.013406219189916744 0.04511515526311587 0.02177338540954314 0.025974370018109923 0.022023084858175945 0.02715933469271844 0.0197217875965152 0.018951645548169263 0.014496894798537002 0.023111019879448098 0.014170020247782397 0.013125944790148964 0.013026454562392011 purity, coverage, fmeasure = metric.compute_metrics() print(f'Purity = {100purity:.1f}% / Coverage = {100coverage:.1f}%') Purity = 78.4% / Coverage = 1.0%

hedonistrh commented 6 years ago

Thanks for the reply.

May you look your hypothesis.to_annotation() result. It will give some idea to us.

It is quite interesting. We are trying somewhat same thing, however, we get different results. I think, the only difference comes from here. I use weights from 5. epoch.

!pyannote-change-detection apply tutorials/change-detection/train/AMI.SpeakerDiarization.MixHeadset.train/weights/0005.pt AMI.SpeakerDiarization.MixHeadset raw_scores

yinruiqing commented 6 years ago

In fact, if you choose difference thresholds in Peak, you'll get different results. Usually, threshold is chosen by validation. If you don't want do validation. You can try different thresholds by yourself to plot the coverage and purity curve.

hedonistrh commented 6 years ago

Thanks for the interest.

When I tried this and look for the output which shows that there are always changes.

peak = Peak(alpha=0.1, min_duration=1.0, log_scale=True)

If I tried this, it gives %100 coverage.

peak = Peak(alpha=0.13, min_duration=1.0, log_scale=True)

saisumit commented 6 years ago

hey @hbredin @yinruiqing @hedonistrh as i get it, these hyperparameters are chosen at the end by the final pipeline module that we run, this being said am i correct about this "The scd training module is not working" because even if i use the trained model at 500th epoch, it gives strange results for coverage using the default parameters provided by the author : peak = Peak(alpha=0.5, min_duration=1.0, log_scale=True) I am assuming that both the values of purity and coverage should lie in [ 70,90 ] range going by the results described in this paper, though its for a slightly different dataset ( https://pdfs.semanticscholar.org/edff/b62b32ffcc2b5cc846e26375cb300fac9ecc.pdf )

hedonistrh commented 6 years ago

Yes, it is for ETAPE dataset, however, I think, the results that we get are not appropriate according to paper.

Also, you can visualize your outputs. It can give some idea.

bml1g12 commented 6 years ago

This is validation taken every 100 iterations: image And the following results using the tutorial settings after the 1000th epoch and min_duration=1.0 and the first test file in AMI

image

A very large minimum duration is required to get a reasonable coverage. The results do seem quite different to the ETAPE results in the paper.

hedonistrh commented 6 years ago

@bml1g12 Thanks for sharing these results. Yes, according to paper both metric should be like %90. We use different dataset, however, I think, results are not good.

hbredin commented 6 years ago

Taking a (short) break from my (long) summer break to comment on this issue.

You are actually comparing apples and oranges.

The original paper reports SegmentationPurity and SegmentationCoverage while the above script reports DiarizationPurity and DiarizationCoverage.

You should use SegmentationPurity and SegmentationCoverage to reproduce results in the paper.

More info about the different metrics can be found in pyannote.metrics paper.

bml1g12 commented 6 years ago

Indeed, using SegmentationPurity and SegmentationCoverage I obtain: image

hedonistrh commented 6 years ago

@hbredin Thanks for the comment.

@bml1g12 May you share hypothesis.to_annotation() result? Also, thanks for sharing these results.

bml1g12 commented 6 years ago

Using alpha=0.2, min_duration=1.0 on the first file EN2002b.Mix-Headset, zoomed in on the first 200 seconds for clarity notebook.crop = Segment(0,200)

image

hedonistrh commented 6 years ago

Thanks for the reply.

I will try to train it again. Because, as I wrote, I always get %100 coverage.

bml1g12 commented 6 years ago

Maybe you are using a bad epoch - how does your validation coverage for the epoch you are using compare? i.e. in the tensorboard file

bml1g12 commented 6 years ago

Ah I see your using the 5th epoch, whereas I'm using 1000th - I suspect that is the issue

hedonistrh commented 6 years ago

I agree with you. My computing resource is limited, so that, I have used a few epochs. When I tried for 100 epochs, loss was non-decreasing after 10th epochs.

Edit: I have tried with weights from 50th epoch and take the alpha as a 0.25. Now, the result make sense.

screenshot_2018-08-01 jupyterlab 1

Thanks for the helps. :pray:

hedonistrh commented 6 years ago

@bml1g12 Hello, may you share the weights file? Because, I can not train until 1000th epoch. :)

saisumit commented 6 years ago

here you go https://drive.google.com/open?id=10kLHAOBcsvOUlnC_glYFjpuHV25QpmD7

hedonistrh commented 6 years ago

@saisumit Thanks! :pray: I have tried it, however, I got this error. CUDA driver version is insufficient for CUDA runtime version

I have no access for Nvidia Gpu and I am using 16.04 Ubuntu. Probably, error occurs because of these.

bml1g12 commented 6 years ago

I'm afraid mine was also run on a cuda GPU so I can't help you either

hbredin commented 6 years ago

Closing as it seems that this issue has diverged from the original one.