parlance / ctcdecode

PyTorch CTC Decoder bindings
MIT License
827 stars 241 forks source link

How to use timesteps? #203

Open blankspark opened 2 years ago

blankspark commented 2 years ago

I have noticed the output of ctcdecode includes timesteps, which the description says it can be used as alignment. But I just get shape (Batchsize,N_beams,N_timesteps). I don't know how to use it.

timesteps - Shape: BATCHSIZE x N_BEAMS

The timestep at which the nth output character has peak probability. Can be used as alignment between the audio and the transcript.

Thanks in advance.

abarcovschi commented 10 months ago

@blankspark have you ever figured out how to use them? I am looking to get word-level time alignments, but I don't know how to calculate this information from the timesteps returned by ctcdecode.