kaistmm / Audio-Mamba-AuM

Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
BSD 3-Clause "New" or "Revised" License
104 stars 12 forks source link

I would like to ask, can this model handle 3 to 4 minutes long audio? Can it encode such long audio? #6

Closed mulatikhr closed 2 weeks ago

mulatikhr commented 1 month ago

I very much hope to get a reply from you, I'm very interested in this paper of yours

mhamzaerol commented 2 weeks ago

Hi, thank you for your interest in our work!

While the provided model checkpoints can technically handle audio clips of 3-4 minutes during inference, their performance may decline with longer inputs, as they were trained on 10-second audio segments across all datasets. To achieve reliable results with longer audio, retraining or fine-tuning the model is recommended.

To help you get started, here are some resources in the repository that may be useful: