huggingface / speechbox

Apache License 2.0
342 stars 33 forks source link

ASR segment speaker match using IoU to address the issue #28 #35

Open Pikauba opened 7 months ago

Pikauba commented 7 months ago

This merge request address the bug in: https://github.com/huggingface/speechbox/issues/28

As stated in the issue, there is a clear problem with the actual assignment process in the diarize.py.

Especially with those lines.

As I explained there : https://github.com/huggingface/speechbox/issues/28#issuecomment-1841661451 , we have to refactor the algorithm in the call method of the ASRDiarizationPipeline.

The idea is to use the intersection over union to match the results from the diarization segments and the asr segments timestamps. We assign the speaker with the best matching IoU for each asr segment.

It is possible to set a threshold to ignore IoU match lower than a specific value and we can assigne a specific "no match" label when the is not a clear match found between a asr segment and any of the diarization segments available.

I removed the same speaker squashing part but we can probably do some refactoring in order to re-implement it in this pull request.

I would like to have feedback about this pull request as I am open to make improvements to it or make changes I could have forgot to take into account.

2010b9 commented 2 months ago

Thanks for doing this! I've tried your code, but I'm having the same issue mentioned in https://github.com/huggingface/speechbox/issues/28#issuecomment-2120682281. I don't know why it happens, but I haven't looked thoroughly to the code yet.