microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.21k stars 114 forks source link

Why can WavLLM understand audio sounds as well? #90

Open BenoitWang opened 2 months ago

BenoitWang commented 2 months ago

Hi, I tested and found that WavLLM can sometimes understand audio sounds too. Seeing that all the training data mentioned in the paper are speech-related, I just wonder where comes this capability please?

XiaoshanHsj commented 2 months ago

i guess that the whisper encoder maybe have the capability to process audio, and the feature space of whisper for audio and speech are closed.