The file L.fst is the Finite State Transducer form of the lexicon (L, see "Speech Recognition with Weighted Finite-State Transducers" by Mohri, Pereira and Riley, in Springer Handbook on SpeechProcessing and Speech Communication, 2008). with phone symbols on the input and word symbols on the output. The file L_disambig.fst is the lexicon, as above but including the disambiguation symbols #1, #2, and so on, as well as the self-loop with #0 on it to "pass through" the disambiguation symbol from the grammar. See Disambiguation symbols for more explanation. Anyway, you won't have to deal with this directly.
Our tutorial above on how to create the lang/ directory did not address how to create the file G.fst, which is the finite state transducer form of the language model or grammar that we'll decode with.
root@fab417c4eccd:/usr/local/kaldi/egs/taiwanese/s5c/data/local# head free-syllable/lexicon.txt
a ʔ- a
ah ʔ- aʔ
ai ʔ- ai
aih ʔ- aiʔ
ak ʔ- ak
am ʔ- am
an ʔ- an
ang ʔ- aŋ
ann ʔ- aⁿ
annh ʔ- aⁿʔ
root@fab417c4eccd:/usr/local/kaldi/egs/taiwanese/s5c/data/local# head free-syllable/uniform.fst
0 0 a a
0 0 ah ah
0 0 ai ai
0 0 aih aih
0 0 ak ak
0 0 am am
0 0 an an
0 0 ang ang
0 0 ann ann
0 0 annh annh
發現
loca/dict 裡還有個free-syllable/dict 都用的script在
./產生free-syllable的graph.sh:19:cp ${data}/local/dict/[^l]* ${data}/local/free-syllable/dict
可知 free-syllable 只是多dict兩個檔,在twsas走評估前一個script 產生free-syllable的graph會跑到
請教
a ʔ- a
0 0 a a
根據
fst
lexicon, uniform
free-syllable vs normal dict