Use a modified ctc_topo.

k2-fsa / snowfall

Moved to https://github.com/k2-fsa/icefall

Apache License 2.0

143 stars 42 forks source link

Use a modified ctc_topo. #209

Closed csukuangfj closed 3 years ago

csukuangfj commented 3 years ago

This implements the topo mentioned in https://github.com/k2-fsa/k2/issues/746#issuecomment-856421616

Some examples (assuming there are three phones: a, b, c)

CAUTION: There are no mandatory blanks between the two consecutive symbols aa in aabc.

csukuangfj commented 3 years ago

The following shows the same example using the existing ctc_topo:

danpovey commented 3 years ago

Wow that was fast! Merge when you think it makes sense, looks good to me! (BTW at some point we should change the args of those function to be just an integer saying the number of phones; the list input is not good because we require the list to be contiguous.)

pzelasko commented 3 years ago

Maybe I’m missing sth but if we allow no blank between repeated phones, then isn’t the blank redundant? Can we simply use a 1 state pure self loop phone topo (+ final state) instead with the same result?

danpovey commented 3 years ago

I think empirically the shared blank helps. If the nnet doesn't want to use it, it can just make it very improbable. (in LF-MMI).

On Tue, Jun 8, 2021 at 7:21 PM Piotr Żelasko @.***> wrote:

Maybe I’m missing sth but if we allow no blank between repeated phones, then isn’t the blank redundant? Can we simply use a 1 state pure self loop phone topo (+ final state) instead with the same result?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/snowfall/pull/209#issuecomment-856678645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO2KQZ67FQYFCLWE2CLTRX4MDANCNFSM46JKH3BQ .

xiaohui-zhang commented 3 years ago

Nice to see this non-standard topo implemented finally (as we discussed before @danpovey , Fig 1a in http://oa.ee.tsinghua.edu.cn/~ouzhijian/pdf/ctc-crf.pdf). This doesn't matter much during training (because even with the standard topo, we can use rule-based numerator FST construction rather than composing topo FST with tokenized transcripts, to lower the computation cost of numerator construction). But this significantly improves decoding speed (much smaller HLG). The WER degradation is minimal. The main issue is that words like "met" and "meet" will be more confusable during both training and decoding. @pzelasko Yeah having a shared blank is important regarding training performance, especially when we use specAug (we can achieve similar effects by allowing skippable silence phones within a word, which is hacky for HMM). But silence/HMM still has its advantage when we need a model to produce accurate alignments/decoding time-stamps.

pzelasko commented 3 years ago

interesting, thanks for the explanation.