Simultaneous Speech-to-text Translation using Monotonic Multihead Attention(MMA).
I am wondering if anybody is working on implementing this model for now.
However, I am worried that if this model is going to be supported by Hugging Face systems, since inference works in a particular way using frameworks like SimulEval to simulate streaming input which may not be compatible with current Hugging Face's inference system?
Model description
MMA(Ma et al., 2019) has been used to handle streaming text/speech inputs mostly for translation, where MMA extends the monotonic attention mechanism to multihead.
🌟 New model addition
Simultaneous Speech-to-text Translation using Monotonic Multihead Attention(MMA). I am wondering if anybody is working on implementing this model for now. However, I am worried that if this model is going to be supported by Hugging Face systems, since inference works in a particular way using frameworks like SimulEval to simulate streaming input which may not be compatible with current Hugging Face's inference system?
Model description
MMA(Ma et al., 2019) has been used to handle streaming text/speech inputs mostly for translation, where MMA extends the monotonic attention mechanism to multihead.
Open source status
Inference framework : Facebook Research SimulEval