Closed saisumit closed 6 years ago
I'm not the developer but If you do this method I would think you will need to regenerate the Precomputed MFCCs for each step.
That being said, I suspect there is a more elegant way of using the pipeline for a new .wav without making a new database which would instead make use of the underlying python API; I've done this for embeddings but haven't used the pipeline yet.
Actually i renamed the .wav to ES2003a that is actually the first file for which mfcc's, raw sad values and scd values are generated so you can simply break the loop after that and still get the reqd files. It would be great if you can tell me how to do it with the complete pipeline. That being said am i actually doing anything wrong as the result of diarization are simply useless ( in comparison to LIUM / AALTO diarization libraries ) which have much more robust results
Hi, I have some problem about speaker change detection. It always says there is a no change for whole file. #111
I suggest that you should firstly check your training for speaker change detection. Maybe, you have same kind of problem like me.
As i said it works perfectly on AMI dataset so i don't think that the problem is there. These are the two files i tried this on : https://drive.google.com/open?id=15Stt_JjWT7rzypHP5v9FfFTU1NmMHcew https://drive.google.com/open?id=1IP8v5_VMiQQnk426R6p-fZajyhER04Hj
Hi, It is quite interesting. How you can check it works perfectly or not? Can you share result of pyannote metrics for test files like this script.
# Loop on Test Files
from pyannote.database import get_annotated
for test_file in protocol.test():
# print (test_file)
# load reference annotation
reference = test_file['annotation']
uem = get_annotated(test_file)
# load precomputed change scores as pyannote.core.SlidingWindowFeature
scd_scores = precomputed(test_file)
# binarize scores to obtain speech regions as pyannote.core.Timeline
hypothesis = peak.apply(scd_scores, dimension=1)
# evaluate speech activity detection
metric(reference, hypothesis.to_annotation(), uem=uem)
purity, coverage, fmeasure = metric.compute_metrics()
print(f'Purity = {100*purity:.1f}% / Coverage = {100*coverage:.1f}%')
purity, coverage, fmeasure = metric.compute_metrics()
print(f'Purity = {100*purity:.1f}% / Coverage = {100*coverage:.1f}%')
I ask this because you said this.
Speaker diarization does't works, everything is classified as 0
If you got %100 coverage result from the script, it can be reason of your problem.
Hey, i tried out you suggetion, seems like i am getting 1% coverage. Any idea what can be done ?
from pyannote.database import get_protocol protocol = get_protocol('AMI.SpeakerDiarization.MixHeadset') from pyannote.audio.features import Precomputed precomputed = Precomputed('/media/DataDriveA/Datasets/sumit/scd') from pyannote.audio.signal import Peak peak = Peak(alpha=0.5, min_duration=1.0, log_scale=True) from pyannote.metrics.diarization import DiarizationPurityCoverageFMeasure metric = DiarizationPurityCoverageFMeasure() from pyannote.database import get_annotated for test_file in protocol.test(): ... reference = test_file['annotation'] ... uem = get_annotated(test_file) ... scd_scores = precomputed(test_file) ... hypothesis = peak.apply(scd_scores, dimension=1) ... metric(reference, hypothesis.to_annotation(), uem=uem) ... 0.019149014550107722 0.014072230507041606 0.03635586201714968 0.018868535259038234 0.016567144528221323 0.018798456401816547 0.03245998534873909 0.01930586206686937 0.018597586570416588 0.013406219189916744 0.04511515526311587 0.02177338540954314 0.025974370018109923 0.022023084858175945 0.02715933469271844 0.0197217875965152 0.018951645548169263 0.014496894798537002 0.023111019879448098 0.014170020247782397 0.013125944790148964 0.013026454562392011 purity, coverage, fmeasure = metric.compute_metrics() print(f'Purity = {100purity:.1f}% / Coverage = {100coverage:.1f}%') Purity = 78.4% / Coverage = 1.0%
Thanks for the reply.
May you look your hypothesis.to_annotation() result. It will give some idea to us.
It is quite interesting. We are trying somewhat same thing, however, we get different results. I think, the only difference comes from here. I use weights from 5. epoch.
!pyannote-change-detection apply tutorials/change-detection/train/AMI.SpeakerDiarization.MixHeadset.train/weights/0005.pt AMI.SpeakerDiarization.MixHeadset raw_scores
In fact, if you choose difference thresholds in Peak, you'll get different results. Usually, threshold is chosen by validation. If you don't want do validation. You can try different thresholds by yourself to plot the coverage and purity curve.
Thanks for the interest.
When I tried this and look for the output which shows that there are always changes.
peak = Peak(alpha=0.1, min_duration=1.0, log_scale=True)
If I tried this, it gives %100 coverage.
peak = Peak(alpha=0.13, min_duration=1.0, log_scale=True)
hey @hbredin @yinruiqing @hedonistrh as i get it, these hyperparameters are chosen at the end by the final pipeline module that we run, this being said am i correct about this "The scd training module is not working" because even if i use the trained model at 500th epoch, it gives strange results for coverage using the default parameters provided by the author : peak = Peak(alpha=0.5, min_duration=1.0, log_scale=True) I am assuming that both the values of purity and coverage should lie in [ 70,90 ] range going by the results described in this paper, though its for a slightly different dataset ( https://pdfs.semanticscholar.org/edff/b62b32ffcc2b5cc846e26375cb300fac9ecc.pdf )
Yes, it is for ETAPE dataset, however, I think, the results that we get are not appropriate according to paper.
Also, you can visualize your outputs. It can give some idea.
This is validation taken every 100 iterations: And the following results using the tutorial settings after the 1000th epoch and min_duration=1.0 and the first test file in AMI
A very large minimum duration is required to get a reasonable coverage. The results do seem quite different to the ETAPE results in the paper.
@bml1g12 Thanks for sharing these results. Yes, according to paper both metric should be like %90. We use different dataset, however, I think, results are not good.
Taking a (short) break from my (long) summer break to comment on this issue.
You are actually comparing apples and oranges.
The original paper reports SegmentationPurity and SegmentationCoverage while the above script reports DiarizationPurity and DiarizationCoverage.
You should use SegmentationPurity
and SegmentationCoverage
to reproduce results in the paper.
More info about the different metrics can be found in pyannote.metrics paper.
Indeed, using SegmentationPurity and SegmentationCoverage I obtain:
@hbredin Thanks for the comment.
@bml1g12 May you share hypothesis.to_annotation() result? Also, thanks for sharing these results.
Using alpha=0.2
, min_duration=1.0
on the first file EN2002b.Mix-Headset, zoomed in on the first 200 seconds for clarity notebook.crop = Segment(0,200)
Thanks for the reply.
I will try to train it again. Because, as I wrote, I always get %100 coverage.
Maybe you are using a bad epoch - how does your validation coverage for the epoch you are using compare? i.e. in the tensorboard file
Ah I see your using the 5th epoch, whereas I'm using 1000th - I suspect that is the issue
I agree with you. My computing resource is limited, so that, I have used a few epochs. When I tried for 100 epochs, loss was non-decreasing after 10th epochs.
Edit: I have tried with weights from 50th epoch and take the alpha as a 0.25. Now, the result make sense.
Thanks for the helps. :pray:
@bml1g12 Hello, may you share the weights file? Because, I can not train until 1000th epoch. :)
@saisumit Thanks! :pray: I have tried it, however, I got this error.
CUDA driver version is insufficient for CUDA runtime version
I have no access for Nvidia Gpu and I am using 16.04 Ubuntu. Probably, error occurs because of these.
I'm afraid mine was also run on a cuda GPU so I can't help you either
Closing as it seems that this issue has diverged from the original one.
Hey, as suggested by the detailed tutorials, i went through them and trained all the models required for the pipeline. The pipeline is working on the AMI dataset but when i try to reproduce the results on other .wav files sampled at 16k, mono, and 256bps, it is not able to diarize the audio. Here is the breif of what i actually did. 1) Took a random meeting audio file, sampled at 16k , mono and 256bps 2) renamed it to ES2003a and replaced it with actual ES2003a ( thought it as a turnaround of creating another database ) 3) ran all the pipelines ( sad,scd, emb, diarization )
Output : 1) Speaker activity detection works perfectly and is able to classify regions of speech. 2) Speaker diarization does't works, everything is classified as 0
can you please tell if its because of replacing the actual file that the pipeline is giving wrong outputs for the diarization, and whats a better way to test the pipeline on random audios.