k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
515 stars 103 forks source link

Capitalization on output text #395

Closed daniel-dona closed 1 year ago

daniel-dona commented 1 year ago

Maybe this is a silly question, but why is the output of the pre-trained models always uppercase?

Is this some limitation/optimization or just the way the models were trained?

csukuangfj commented 1 year ago

why is the output of the pre-trained models always uppercase?

Not always, really. The reason why you always get uppercase output is that you are always using models that output uppercase.

During training, we normalize transcripts so that they are always uppercase or lowercase; so during inference, if the model is trained using uppercase texts, then it outputs uppercase; otherwise, it outputs lowercase.

You can have a look at tokens.txt. If it is all uppercase, then the output would also be all uppercase.


If you don't normalize your transcript during training, then you will get both lowercase and uppercase output during inference.

daniel-dona commented 1 year ago

That makes sense, thank you @csukuangfj