kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
421 stars 89 forks source link

Arbitrary `text_frames` returned in output beams? #75

Open slavkirov opened 2 years ago

slavkirov commented 2 years ago

From my understanding of CTC, it results in alignment-free outputs -- that is, sequences in the transcribed output do not correspond to a unique segment of input logits. Yet, the beams produced by _decode_logits() (and hence decode_beams()) include such text_frames for each token of output text. They seem to correspond to any arbitrary alignment, as when beams are merged in https://github.com/kensho-technologies/pyctcdecode/blob/81514695348f25e71577cb191d2a77f6b1d7f884/pyctcdecode/decoder.py#L118 the text_frames from last beam/alignment processed overwrite those from any previous alignment for a given prefix.