srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
822 stars 342 forks source link

where is the code to generate the prior of states #96

Closed xfwu closed 7 years ago

xfwu commented 7 years ago

Hi Yajie

In your paper section “3.2: posterior normalization" you mentioned there are method to generate the priors.

I could not find it in Eesen, also in the lattice-faster-decoder.cc which should use them I guess?

BTW, since CTC output are so peaky, and so many blank peaked in each frame, what is the best way to do smbr?

Best

fmetze commented 7 years ago

The train_ctc_parallel.sh script contains the following call (or something equivalent - depending on the exact recipe that you are running):

Compute the occurrence counts of labels in the label sequences. These counts will be used to

derive prior probabilities of the labels.

gunzip -c $dir/labels.tr.gz | awk '{line=$0; gsub(" "," 0 ",line); print line " 0";}' | \ analyze-counts --verbose=1 --binary=false ark:- $dir/label.counts >& $dir/log/compute_label_counts.log || exit 1

xfwu commented 7 years ago

thank you very much!

best