k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.
https://k2-fsa.github.io/k2
Apache License 2.0
1.1k stars 214 forks source link

Hybrid autoregressive transducer (HAT) #1244

Closed desh2608 closed 8 months ago

desh2608 commented 11 months ago

This is an implementation of the HAT loss proposed in https://arxiv.org/abs/2003.07705.

The test produces reasonable looking losses. I am working on a LibriSpeech zipformer recipe using this loss. In general, it is not expected to improve upon the RNNT loss by itself, but may be useful for things like using external LMs. I am planning to use it in multi-talker ASR for speaker attribution (e.g. https://arxiv.org/abs/2309.08489)

danpovey commented 11 months ago

Great!

desh2608 commented 8 months ago

@csukuangfj could you also check this when you have some time? Thanks!

csukuangfj commented 8 months ago

@csukuangfj could you also check this when you have some time? Thanks!

Thanks! Left a minor comment. Otherwise, it looks good to me.

desh2608 commented 8 months ago

@csukuangfj could you also check this when you have some time? Thanks!

Thanks! Left a minor comment. Otherwise, it looks good to me.

Sorry it took a while since I was on vacation last 2 weeks. I have made the change.

csukuangfj commented 8 months ago

Thanks!