m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.96k stars 1.26k forks source link

The "score" in the returned results represents what? #398

Open chaoqingshuai opened 1 year ago

chaoqingshuai commented 1 year ago

{'segments': [{'start': 0.975, 'end': 1.575, 'text': ' Yes, sir.', 'words': [{'word': 'Yes,', 'start': 0.975, 'end': 1.215, 'score': 0.537}, {'word': 'sir.', 'start': 1.295, 'end': 1.575, 'score': 0.302}]}, ]}

rosyvs commented 1 year ago

I had the same question and found this from m-bain themself: https://github.com/m-bain/whisperX/issues/20

word-segments now have a "score", however, this is from wav2vec2 model not whisper d395c21