Open jnj2102 opened 1 year ago
@sgugger what do you think of this request? If you think it's a good addition to the repo, I can take this task on.
cc @sanchit-gandhi and @hollance
Hey @jnj2102! Thanks for the feature request - while I think it's a cool model, I'm not sure it's best suited in the transformers
library directly since the original repository has quite low usage (20 stars) and the paper as well (4 citations). If you're really keen on using this model, you could explore adding it to the Hub, e.g. as done with the MERT model. WDYT?
Hi! No problem. How do you add a model to the Hub? I’ll check out the MERT model too.
On Fri, Jun 2, 2023 at 11:29 AM Sanchit Gandhi @.***> wrote:
Hey @jnj2102 https://github.com/jnj2102! Thanks for the feature request
- while I think it's a cool model, I'm not sure it's best suited in the transformers library directly since the original repository has quite low usage (20 stars) and the paper as well (4 citations). If you're really keen on using this model, you could explore adding it to the Hub, e.g. as done with the MERT https://huggingface.co/m-a-p/MERT-v1-95M model. WDYT?
— Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/23781#issuecomment-1573929450, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP2N4VD6EPP5HNQTGMZ6ALXJIBFDANCNFSM6AAAAAAYPVMXB4 . You are receiving this because you were mentioned.Message ID: @.***>
-- Best wishes,
Jami
Hey Jami! Awesome - there's info on using custom code on the Hub here: https://huggingface.co/docs/transformers/v4.27.1/en/custom_models#using-a-model-with-custom-code. Let me know if you have any questions, more than happy to help here!
Model description
BART- fusion, a novel model for generating lyric interpretations from lyrics and music audio that combines a large-scale pre-trained language model with an audio encoder. It uses a cross-modal attention module to incorporate the audio representation into the lyrics representation to help the pre-trained language model understand the song from an audio perspective, while preserving the language model’s original generative performance. Please see the paper here: https://arxiv.org/abs/2208.11671
Open source status
Provide useful links for the implementation
Here is the code repository for the paper: https://github.com/ldzhangyx/BART-fusion/tree/main. The weights should be available in the checkpoints: https://drive.google.com/drive/folders/18EUUx-KT9xGJ1uq2UoOgj0X9BpngNn_T