Closed callmephilip closed 7 months ago
Hi there. My experience is that it is performing quite good. Can you give more details and examples?
here's an example https://replicate.com/p/j7yjql3bm6ofhkdh7rwyfd43ky
i reran this with an older version and it's looking much better - https://replicate.com/p/tqylf3tbokbpa7qhu4jo4p2p34. based on input from https://github.com/FanaHOVA/smol-podcaster/blob/main/smol_podcaster.py#L61
Could you try again with the latest version (b9fd8313c0d492bf1ce501b3d188f945389327730773ec1deb6ef233df6ea119)?
Could you try again with the latest version (b9fd8313c0d492bf1ce501b3d188f945389327730773ec1deb6ef233df6ea119)?
not working properly still https://replicate.com/p/lmtszi3bxadfoewofvteloh3ya
i am gonna stick to 7e5dafea13d80265ea436e51a310ae5103b9f16e2039f54de4eede3060a61617
for now, i think
Hey @thomasmol. Coming back to this, as I am trying to understand why I am seeing such a stark contrast in quality of diarization. I have noticed that 7e5dafea13d80265ea436e51a310ae5103b9f16e2039f54de4eede3060a61617
is using speechbrain/spkrec-ecapa-voxceleb
model for getting speaker embeddings which are then manually clustered (?) to do speaker attribution.
what i am wondering is why you moved to pyannote/speaker-diarization-3.1
later? did you get better results with this new setup?
Hi Philip, i switched to pyannote 3.1 because it's a much improved model with more accurate diarization in all benchmarks. It has been working better for me as well than the model i used earlier.
Could you try using the latest version of the replicate model but set group_segments
to true
and provide a prompt
with names and other words with punctuation? (e.g. Thomas, Philip, diarization.
, this should improve the transcript quality and might help creating better speaker segmentation)
Hi @callmephilip, thank you for the insights ! I've been seeing the same issue as you, and I find much better results reverting to the version you speak about...! I have to say I do not know much about the benchmarks and the way they are done, but this system seems to make a difference...
Hey Thomas. I am seeing some very inaccurate results of speaker assignment on some test audios (2 speakers per file, both male with fairly distinctive voices). What has your experience been overall?