k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.
https://k2-fsa.github.io/k2
Apache License 2.0
1.1k stars 214 forks source link

Online determinization #1208

Open francisr opened 1 year ago

francisr commented 1 year ago

Is there a way to get with k2 something similar to the lattice incremental decoder that is in Kaldi?
The k2 OnlineDenseIntersecter returns partial lattices, which is good, but I haven't found a nice way to avoid having to determinize the whole lattice at every chunk.

danpovey commented 1 year ago

For some uses, might not need to determine?

On Fri, Jun 9, 2023, 6:18 PM Rémi Francis @.***> wrote:

Is there a way to get with k2 something similar to the lattice incremental decoder that is in Kaldi? The k2 OnlineDenseIntersecter returns partial lattices, which is good, but I haven't found a nice way to avoid having to determinize the whole lattice at every chunk.

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/1208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO6RYV6V5QGP75PZVRLXKM5FNANCNFSM6AAAAAAZAZFNRM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

francisr commented 1 year ago

I'm trying to spread the cost of a lattice rescore over time, so I don't need an exact lattice. I was thinking to sample paths of the lattice, but I'm not convinced that it's going to work as well as I'd like when the segment gets longer.

danpovey commented 1 year ago

Is it a neural lm?

On Fri, Jun 9, 2023, 9:40 PM Rémi Francis @.***> wrote:

I'm trying to spread the cost of a lattice rescore over time, so I don't need an exact lattice. I was thinking to sample paths of the lattice, but I'm not convinced that it's going to work as well as I'd like when the segment gets longer.

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/1208#issuecomment-1584994543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO6KF6EA6RBQC5CATFLXKNU2NANCNFSM6AAAAAAZAZFNRM . You are receiving this because you commented.Message ID: @.***>

francisr commented 1 year ago

Yes.

danpovey commented 1 year ago

My feeling is, an n-best / "beam-search" (in the RNN-T sense) type of algorithm might make sense, as long as you don't have a very nontrivial graph to decode with. Or could perhaps keep the graph on CPU and advance the state, if it is deterministic like a phone bigram (as used for LODR), or can be treated as deterministic if epsilons are treated as "failure" transitions to only be taken if there is no explicit instance of that symbol. We have been focusing more on RNN-T than on CTC which is why we haven't implemented this kind of thing, but given recent trends in ASR, and user requirements, I think it might be worthwhile for us to implement something like this ourselves. In the past becaus of the focus on k2, I think we wanted algorithms that are convenient for GPU, but I think

Someone mentioned to me at ICASSP that for long utterances, it's important to merge paths that differ "too long ago". One possibility might be to use a sorting criterion (for keeping n-best paths) that exaggerates the "delta" between a path and <>. E.g. let

modified_score[path] = min_{path2} score[path2] + f(common_suffix_length[path,path2]) (score[path] - score[path2])

where f(n) is some function like f(n) = 1 + 0.05 n.

francisr commented 1 year ago

Unfortunately my decode graph is a traditional HMM + ngram, which means the non determinized lattice is quite big.

Someone mentioned to me at ICASSP that for long utterances, it's important to merge paths that differ "too long ago".

Makes sense, although wouldn't they naturally merge due to the markov property of the decode graph?

danpovey commented 1 year ago

I was thinking about situations like where there is an RNNLM or full-context (non-stateless) RNN-T, where there are full histories.