k2-fsa / snowfall

Moved to https://github.com/k2-fsa/icefall
Apache License 2.0
143 stars 42 forks source link

Lexicon only decoding question #241

Closed alucassch closed 2 years ago

alucassch commented 2 years ago

Is it possible to use k2 to decode a character based CTC model with a specific set of words in a lexicon?

I mean, how do I compose only the L (lexicon) with H (ctc_topo) but without G (the language model) to create the decode lattice with k2.intersect_dense_pruned?

lattices = k2.intersect_dense_pruned(HL, dense_fsa_vec, 20.0, 8, 30, 10000)
csukuangfj commented 2 years ago

Please refer to https://github.com/k2-fsa/snowfall/blob/master/snowfall/decoding/graph.py for how HLG is generated. To reuse the code, you can use a unigram G, which we do provide code to build that (see https://github.com/k2-fsa/snowfall/blob/45898f70e3982a47a6fb831c6bb46d5acec9aaf2/snowfall/training/mmi_graph.py#L44)

You have to add a self-loop with the word label #0 at the loop state (i.e., state 0).