roudimit / whisper-flamingo

[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
https://arxiv.org/abs/2406.10082
Other
79 stars 3 forks source link