Open HalukMaestra opened 1 year ago
Yes plesae, it would be nice. In a similar way as in whisper_timestamped.
ok, I did some digging, and got some insight to share:
faster_whisper (which whisperx uses) could give word alignement probabilities with the whisperx.load_model(model, device, language=lang, asr_options={"word_timestamps": True,})
, however, as whisperx uses it's own alignment it is not relevant.
so there are 2 paths to go:
1) get an 'forced alignment' prediction from wav2vec2.0 (or other alignment method), which is a reasonable way to get a probability score for the transcription 2) dig deeper in faster_whisper and see where and how the real whisper probability scores where determined
First of all I would like to thank you for all the great work done with this application. It's a joy to use and way better then the Vanilla whisper. One question I have is, is it possible to include word-level confidence scores inside result_aligned["word_segments"]? Obtaining ["segments"] and parsing it to get to the score is a tedious process for me since I have no need for other data expect the ones in word_segments? I was just wondering if this is viable to do?