Adding MAEST model - Githubissues

palonso commented 1 year ago

Model description

Hi:)

The Music Audio Efficient Spectrogram Transformer (MAEST) features the same architecture as AST (or ViT) and is specifically trained for music applications. MAEST features a head performing music style classification with 400 classes, and, according to the original research, its representations are useful in a wide range of downstream music analysis tasks.

The original code and weights are available online, and the work was accepted for the 2023 ISMIR conference (post-print here).

The main difference with AST is on the specifics of the mel-spectrogram implementation. Because of this, I propose creating a new model that copies the AST architecture and defines a custom FeatureExtractor.

In the following days, I'll create a pull request adding this model according to the documentation. Please, let me know if this plan seems good or if there is something else I should consider.

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

Original implementation and weights available here made by me (@palonso).

NielsRogge commented 1 year ago

If the model is exactly the same, then you only need to add a MaestFeatureExtractor class. One can use the Auto classes to load the model (AutoModel and AutoModelForAudioClassification).

palonso commented 1 year ago

@NielsRogge, thank you for the feedback!

I confirm that my models can be run with the ASTModel and ASTModelForAudioClassification classes. Thus, the MAESTModel and MAESTModelForAudioClassification are not really needed. Nevertheless, I implemented these classes since I followed the official guide using the add-new-model-like CLI method.

The advantages I see from having MAESTModel and MAESTModelForAudioClassification are that:

It is consistent with the Adding a New Model Guide.
The repo may look more consistent if every model/ has its modeling_*.py file
As far as I understand, the #Copied from ... directive implies that the code will be synced with the AST implementation so it will not require much more additional maintenance.

Please, let me know if you think it's still not worth it having these classes and I'll remove the methods and modify my models' config files to point to ASTForAudioClassification.

Also, feel free to indicate any additional changes needed or suggestions in the PR. Thanks!

NielsRogge commented 1 year ago

I understand, but we do not have any model in the library that is a 100% copy of another model, so in this case let's only add the feature extractor class.

There are several examples, e.g. DINOv2 does implement a new image processor since it reuses one from a different model.

sanchit-gandhi commented 1 year ago

Hey @palonso! Congratulations on the release of MAEST 🎵 I'm wondering whether a Hub integration would be the best fit for the MAEST model in the Hugging Face ecosystem? We have recently been trying to push a "model on the hub" approach, and have as much support as we can there. This is the recommended way of adding new models and it will also be easier to integrate it. Here is a tutorial if that sound good to you. Modelling code wise, it's a very similar process to transformers. Integration wise, you can move faster and add the model as soon as it's ready on your end.

palonso commented 1 year ago

@sanchit-gandhi, thank you for your feedback and code review! Since the proposed changes seemed easy to address I decided to continue with the PR.

The "model on the hub" approach also seems great. I went through the tutorial and found that actually I did something similar to push my weights to the hub (by loading them to an AST model and using the push_to_hub() method). While now I understand how to push a custom modeling class, it was not very clear to me how to extend this functionality for a custom feature extractor. Would you have any examples?

sanchit-gandhi commented 1 year ago

Hey @palonso - nice that you found the "model on the hub" tutorial useful! The process for pushing a custom feature extractor is more or less the same as that for the model:

Write your feature extractor as an instance of SequenceFeatureExtractor into a python file feature_extraction_maest.py (as you have done in your MAEST PR to transformers)
Load the feature extractor into this MAEST feature extraction class, and register it for auto class (as an instance of "AutoFeatureExtractor" this time, not "AutoModel")
Push the feature extraction code and feature extractor to the Hub (as per the guide)

Let me know if you encounter any difficulties and I'd be more than happy to go deeper on the above points!

palonso commented 1 year ago

@sanchit-gandhi, thank you very much!! This approach worked like a charm and it's already possible to use my models within Transformers 🥳.

There are still a few questions that I would like to ask:

Since I'm now using custom code, the Compute() functionality fails because the trust_remote_code=True flag was not set. I understand that automatically running custom code on your servers implies security concerns, but maybe there could be a way to manually trust certain custom classes (for example, when defined on repositories owned by trusted institutions).

While the "model on the hub" tutorial was very clear to me, it was not direct that you could use push_to_hub() on the feature extraction classes. Do you want me to extend the documentation mentioning this in a separate PR?
While the model on the hub solution works, I still see a few advantages to the integration via PR: a) people could run my model without needing to set trust_remote_code=True, and b) it could have slightly better documentation only by the model cards. I don't want to overload you in case this requires much additional work from your side but, otherwise, do you think it would make sense to merge my PR? I'll close it otherwise.

sanchit-gandhi commented 1 year ago

Super cool! Very happy to hear you were able to push the feature extractor and that the model now works in Transformers 🙌 Answering your questions in-line below:

I'm actually not sure how we enable the inference widget when we have custom code on the Hub - cc'ing @Vaibhavs10 and @osanseviero who might be able to tell you more enabling this when we need trust_remote_code=True
Yes please - if you could update the documentation to improve the custom code docs that would be great! It's worth bearing in mind that not all models will have a feature extractor, so we should make clear that the feature extraction part of the integration is only required for audio/vision models that have a feature extractor, and is not a necessary step for all new models (e.g. ones that use the same feature extractor as an existing model in the library). Perhaps putting it in its own sub-section at the end would be best here?
Note that trust_remote_code=True has been used extremely heavily for recent LLM releases, including the Falcon model, so I think it's a direction users are getting more familiar with. Also sharing a stat that the model cards on the Hub are getting between 5-10x more views than the documentation in Transformers, so again there's a movement to prioritising the Hub over library-specific documentation. So it's really a case that you can have an effective integration by populating your model card well on the Hub and leveraging trust_remote_code=True! If you feel there's still benefit for a Transformers integration, happy to discuss, but IMO you can get all the benefits now with your Hub integration and skip the hassle of multiple PR reviews etc!

osanseviero commented 1 year ago

We don't have support for inference widget for custom code due to security reasons.

sanchit-gandhi commented 1 year ago

Hey @palonso! Sorry for the radio silence here! How would you like to proceed? Are you happy with the current "remote code" integration of MAEST with Transformers and the HF Hub? It look like you've already got a nice integration here: https://huggingface.co/mtg-upf/discogs-maest-30s-pw-129e

palonso commented 12 months ago

Hi @sanchit-gandhi! We got some feedback from people using our models and also noticed that maest-30s-pw-73e-ts is particularly successful, so we are very happy in that sense!

Regarding the integration, we think that still it would be amazing to use our models from the Inference API online. According to @osanseviero, that's not possible with the current setup, so I would like to ask if there is something we could do from our side to facilitate merging our PR to Transformers. Otherwise, if it is hard to fit this in your roadmap, we are happy to leave it as it is and close this issue.

sanchit-gandhi commented 12 months ago

Hey @palonso! That's super cool to hear, congrats to you and your team on the release 🎉 If the Hub integration is working well it might be most time efficient to leave it as is? If we want to do something similar to the Inference API, we could create a simple Gradio demo for the model, and allow users to pass requests over Gradio Client? E.g. on this Space, there is a button at the bottom that says "Use via API". This provides a means of pinging the model hosted on the Space with requests, even if the model is not on the Inference API. WDYT?

palonso commented 12 months ago

@sanchit-gandhi, this solution sounds like a good compromise. Finally, we made an interactive demo in replicate, so we will probably update model cards to redirect there for a quick test without any installation.

Closing this!

huggingface / transformers

Adding MAEST model #26491

Model description

Open source status

Provide useful links for the implementation