Closed ankitmundada closed 6 years ago
Thanks for the report and thorough investigation! I'll get a PR put together to address the issue.
@ankitmundada could you checkout the PR #55 and see if this rectifies your issue? You will have to change a line in decoder.py
to pass the sequence lengths (see https://github.com/SeanNaren/deepspeech.pytorch/pull/239).
@ryanleary I have tested it and it seems to work now! Thanks for the quick update!
Closed by #55.
When using ctcdecode with sequential data of variable output lengths, the smaller outputs are generally padded with zeros to compensate for the extra size of the largest sample. So, logically, when the
ctc_beam_search_decoder
loops through the timesteps ofprobs_seq
at Link for code, it should stop at the timestep corresponding to the actual size of that sample's output instead of the length of theprobs_seq
, sinceprobs_seq
also has extra padding in batch mode. This causes in ctcdecode to add extra garbage characters at the end of its actual output.Examples of such outputs are:
I am using
ctcdecode
with the outputs from deepspeech.pytorchI can think of two possible solutions for this:
Pass the
num_time_steps
toctc_beam_search_decoder
as an argument: i.e. instead ofsize_t num_time_steps = probs_seq.size();
at line, it should besize_t num_time_steps = size # which is passed as an argument
Add a check for some impossible probability outputs, such as
-1
and break the loop whenever its true. I am currently using this hack in our system, and it seems to work! You can find it here For this to work, the outputs of the DeepSpeech model are changed a bit. The extra timestep values are intentionally set to-1
. The changes are hereThe transcripts for same examples, after using the second hacky method are: