Open benniekiss opened 5 months ago
Fixed an off-by-one error in the new method.
I also made a google colab notebook showcasing the improvements: https://colab.research.google.com/drive/1Me3GgQUPXxjuEn06DNVco_GIxlUoYPTE?usp=sharing
In summary, the new method has a slight speed up for fully synthetic data, a 2x speedup for discrete (0s and 1s) synthetic data, and an almost 100x speedup for real data in the SpeakerDiarization
pipeline.
The notebook also lets you extend the real data sample to however many hours is desired under the TEST WITH REAL DATA
section by setting AUDIO_LENGTH
to the desired number of hours.
Data Type | Original Method | V2 Method |
---|---|---|
Synthetic = np.random.randn(100000, 50) |
00:00:08.781 | 00:00:07.972 |
Synthetic Discrete = np.random.randint(0, 2, size=(100000, 50)) |
00:00:19.085 | 00:00:10.724 |
Real Data - huggingface datasets (01:02:27.300 long audio) | 00:00:00.755 | 00:00:00.008 |
EDIT: I realized that I did not test this with various offsets and onsets when initializing the Binarize
class, and after doing so, the implementations are not equal. Will keep working on this to see if there's a way to make any improvements
While processing long audios in the
SpeakerDiarization
pipeline, I noticed that theto_annotation()
method was taking a while, and I tracked it down topyannote.audio.utils.signal.Binarize.__call__()
where it was looping over a numpy array which could end up being quite large.In my tests, the original implementation took about 60 seconds for a 9 hour audio. With this new implementation, it takes about 0.5 seconds.
I've only tested this with the SpeakerDiarization pipeline, but the new implementation returns the same results as the original.