microsoft / CLAP

Learning audio concepts from natural language supervision
MIT License
455 stars 35 forks source link

Is there any possible script to look at for fine-tuning? #23

Open underdogliu opened 11 months ago

underdogliu commented 11 months ago

Hi! First of all thanks a lot for such amazing project.

While appreciating the contribution, I wonder if there is a valid way to fine-tune the model for specific tasks using cutomized datasets? I am trying to adapt the model to improve the performance and the source data structure is also for classification.

bmartin1 commented 11 months ago

Hi @underdogliu,

We don't have any immediate plans to release code for finetuning. If you or anybody else write code to finetune a model using CLAP weights, we would be happy to share the repo link in our CLAP repo. Thanks for using our model in your project 👏!

anshumansinha16 commented 10 months ago

How to load the pre-trained weights? I wish to use the pre-trained encoders (audio and text) for some downstream tasks. Can you provide the state dictionaries for the model? I only want to initialise the weights and don't intend to freeze them.

I only saw the weights for the pre-trained model. But loading them on an instance of CLAP(clap_instance.load_state_dict(state_dict)) , gives me error

RuntimeError: Error(s) in loading state_dict for CLAP: .... ... Unexpected key(s) in state_dict: "epoch", "model", "optimizer", "scheduler".