princeton-ddss / SpeechMLPipeline

SpeechMLPipeline is a complete pipeline to deploy Machine Learning Models to generate labelled and timestamped transcripts from audio inputs
MIT License
0 stars 1 forks source link

Process Whisper Raw Output to Improve the Accuracy of Speaker Diarization #20

Closed fjying closed 7 months ago

fjying commented 10 months ago
fjying commented 10 months ago

Merge duplicated text across continous timestamps together:

Before Merge:

image

After Merge:

image