I looked at your paper "EESEN:end-to-end speech recognition using deep rnn models and wfst-based decoding" for many times. I still don't understand why posterior normalization is needed during decoding .
Question 1 can you explain it in detail ?
Question 2 isn't it softmax probability produced when Wav features are send to the trained network ?, why is "dir/label count" needed?
Question 3 Is the input parameter softmax probability for latgen-faster
Looking forward to your reply
@fmetze @riebling
I looked at your paper "EESEN:end-to-end speech recognition using deep rnn models and wfst-based decoding" for many times. I still don't understand why posterior normalization is needed during decoding . Question 1 can you explain it in detail ? Question 2 isn't it softmax probability produced when Wav features are send to the trained network ?, why is "dir/label count" needed? Question 3 Is the input parameter softmax probability for latgen-faster Looking forward to your reply @fmetze @riebling