Hello, we are doing experiments on gigaspeech with the pretrained models. One of the experiments is to inspect and compare the oracle WER of CTC vs. transducers. Here is what we get:
Note that we have got slightly better CTC results (10.41&10.56) than here, because we tuned the best hlg_scale=0.52 for HLG graph, which is 1.0 in the default recipe.
We got slightly worse WER (10.41&10.53) than here, but notice that the oracle WER of transducer are much worse than that of CTC. They don't even seem comparable.
I was following the recipe here to get the oracle WER for the gigaspeech transducer. Was wondering (1) if it is normal that transducers can have a worse oracle WER than the CTC model (e.g., due to different implementations/mechanisms; or same story for librispeech or other corpus); (2) I forgot to tune some hyper params (I have done num-paths and nbest-scale); or (3) I have done something wrong.
Hello, we are doing experiments on gigaspeech with the pretrained models. One of the experiments is to inspect and compare the oracle WER of CTC vs. transducers. Here is what we get:
CTC (decoding method: attention-decoder/nbest_oracle, num-paths: 1000)
Note that we have got slightly better CTC results (10.41&10.56) than here, because we tuned the best hlg_scale=0.52 for HLG graph, which is 1.0 in the default recipe.
Transducer (decoding method: modified_beam_search/fast_beam_search_nbest_oracle, num-paths: 1000)
We got slightly worse WER (10.41&10.53) than here, but notice that the oracle WER of transducer are much worse than that of CTC. They don't even seem comparable.
I was following the recipe here to get the oracle WER for the gigaspeech transducer. Was wondering (1) if it is normal that transducers can have a worse oracle WER than the CTC model (e.g., due to different implementations/mechanisms; or same story for librispeech or other corpus); (2) I forgot to tune some hyper params (I have done num-paths and nbest-scale); or (3) I have done something wrong.