output of train network and input of decoding lattice

I looked at your paper "EESEN:end-to-end speech recognition using deep rnn models and wfst-based decoding" for many times. I still don't understand why posterior normalization is needed during decoding . Question 1 can you explain it in detail ? Question 2 isn't it softmax probability produced when Wav features are send to the trained network ?, why is "dir/label count" needed? Question 3 Is the input parameter softmax probability for latgen-faster Looking forward to your reply @fmetze @riebling

srvk / eesen

output of train network and input of decoding lattice #115