Hello, I am asking if it is possible to load embeddings from a custom transformer-encoder model, instead of using the one provided by speech brain. Each embedding covers 45s of the audio.
# get the embeddings
chunk_embeddings = [chunk["embedding"] for chunk in chunks]
Perform clustering and convert clusters to speaker diarization labels:
from pyannote.audio.pipelines.clustering import HiddenMarkovModelClustering
# Based on your configuration file
covariance_type = "diag"
threshold = 0.35
clustering = HiddenMarkovModelClustering(covariance_type=covariance_type, threshold=threshold)
diarization = clustering(sliding_window_features)
for segment, label in diarization.itertracks(yield_label=True):
pass # do post processing here
However, I have looked through your speaker diarization pipeline because I could not get the clustering work, and there was a lot more code under the hood. I would appreciate it if you can guide me in the right direction on how to use my own model embedding in the speaker diarization pipeline?
Hello, I am asking if it is possible to load embeddings from a custom transformer-encoder model, instead of using the one provided by speech brain. Each embedding covers 45s of the audio.
create the embeddings and format them:
Perform clustering and convert clusters to speaker diarization labels:
However, I have looked through your
speaker diarization pipeline
because I could not get the clustering work, and there was a lot more code under the hood. I would appreciate it if you can guide me in the right direction on how to use my own model embedding in the speaker diarization pipeline?