ASR segment speaker match using IoU to address the issue #28

This merge request address the bug in: https://github.com/huggingface/speechbox/issues/28

As stated in the issue, there is a clear problem with the actual assignment process in the diarize.py.

As I explained there : https://github.com/huggingface/speechbox/issues/28#issuecomment-1841661451 , we have to refactor the algorithm in the call method of the ASRDiarizationPipeline.

The idea is to use the intersection over union to match the results from the diarization segments and the asr segments timestamps. We assign the speaker with the best matching IoU for each asr segment.

It is possible to set a threshold to ignore IoU match lower than a specific value and we can assigne a specific "no match" label when the is not a clear match found between a asr segment and any of the diarization segments available.

I removed the same speaker squashing part but we can probably do some refactoring in order to re-implement it in this pull request.

I would like to have feedback about this pull request as I am open to make improvements to it or make changes I could have forgot to take into account.

huggingface / speechbox

ASR segment speaker match using IoU to address the issue #28 #35