sooftware / kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
https://sooftware.github.io/kospeech/
Apache License 2.0
603 stars 191 forks source link

recognize bug report #131

Closed kelvinqin closed 3 years ago

kelvinqin commented 3 years ago

Title

Description

In kospeech/models/model.py (line 92), recognize function only fetch the probs without sequence length information:

predicted_logprobs, = self.forward(inputs, input_lengths)

I think this is wrong because inside a batch, difference utterance will have different output length, you should truncate the predicted_log_probs accordingly for metric to calculate a correct CER.

Not sure if I am wrong? But thanks for you to take a look.

Linked Issues

kelvinqin commented 3 years ago

I noticed that in label_to_string function you will make a decision about the end of the prediction by checking whether there is a , this is reasonable for seq2seq model, but for CTC model like deepspeech2, not sure if it is reasonable.

Thanks for your sharing,

sooftware commented 3 years ago

Hi @kelvinqin. The CTC model does not require output_length.You can decode it using , .