huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.86k stars 27.19k forks source link

Support for Monotonic Mulithead Attention based Simultaneous Speech-to-text Translation #15491

Open beomseok-lee opened 2 years ago

beomseok-lee commented 2 years ago

🌟 New model addition

Simultaneous Speech-to-text Translation using Monotonic Multihead Attention(MMA). I am wondering if anybody is working on implementing this model for now. However, I am worried that if this model is going to be supported by Hugging Face systems, since inference works in a particular way using frameworks like SimulEval to simulate streaming input which may not be compatible with current Hugging Face's inference system?

Model description

MMA(Ma et al., 2019) has been used to handle streaming text/speech inputs mostly for translation, where MMA extends the monotonic attention mechanism to multihead.

Open source status

Inference framework : Facebook Research SimulEval

02shanks commented 1 year ago

@beomseok-lee can I work on this issue?