Open kfmn opened 3 weeks ago
First of all there is a misprint in
No, it is correct. You can select an arbitrary positive value for it. Its sole purpose is to simulate how fast the data samples arrive.
This value has nothing to do with your model parameters.
all computed features remain unprocessed. As a result, the decoding hypotheses are sometimes truncated
We have tail paddings here. You can use a larger tail padding if you find that the last chunk is not decoded.
By the way, please provide a concrete example with runnable code/script to reproduce your issue.
If you only look at the code without running it, I suggest that you run it first and then check whether what you think matches the actual result.
I meant two things:
Hi,
I trained a streaming zipformer transducer on my data and converted the model to JIT by export.py with specific values of chunk_length and left_context_frames. Then I wanted to run streaming decoding using jit_pretrained_streaming.py and it seems this script does not decode the final part of audio.
First of all there is a misprint in https://github.com/k2-fsa/icefall/blob/f84270c93528f4b77b99ada9ac0c9f7fb231d6a4/egs/librispeech/ASR/zipformer/jit_pretrained_streaming.py#L218, it should be 0.25 second, not 0.2
Next, features are generated chunk-by-chunk and are decoded whenever the condition in https://github.com/k2-fsa/icefall/blob/f84270c93528f4b77b99ada9ac0c9f7fb231d6a4/egs/librispeech/ASR/zipformer/jit_pretrained_streaming.py#L234 is satisfied.
But if after last call to greedy_search this condition is not satisfied anymore, all computed features remain unprocessed. As a result, the decoding hypotheses are sometimes truncated