Closed palonso closed 12 months ago
If the model is exactly the same, then you only need to add a MaestFeatureExtractor
class. One can use the Auto classes to load the model (AutoModel
and AutoModelForAudioClassification
).
@NielsRogge, thank you for the feedback!
I confirm that my models can be run with the ASTModel
and ASTModelForAudioClassification
classes. Thus, the MAESTModel
and MAESTModelForAudioClassification
are not really needed. Nevertheless, I implemented these classes since I followed the official guide using the add-new-model-like
CLI method.
The advantages I see from having MAESTModel
and MAESTModelForAudioClassification
are that:
model/
has its modeling_*.py
file#Copied from ...
directive implies that the code will be synced with the AST implementation so it will not require much more additional maintenance.Please, let me know if you think it's still not worth it having these classes and I'll remove the methods and modify my models' config files to point to ASTForAudioClassification
.
Also, feel free to indicate any additional changes needed or suggestions in the PR. Thanks!
I understand, but we do not have any model in the library that is a 100% copy of another model, so in this case let's only add the feature extractor class.
There are several examples, e.g. DINOv2 does implement a new image processor since it reuses one from a different model.
Hey @palonso! Congratulations on the release of MAEST 🎵 I'm wondering whether a Hub integration would be the best fit for the MAEST model in the Hugging Face ecosystem? We have recently been trying to push a "model on the hub" approach, and have as much support as we can there. This is the recommended way of adding new models and it will also be easier to integrate it. Here is a tutorial if that sound good to you. Modelling code wise, it's a very similar process to transformers
. Integration wise, you can move faster and add the model as soon as it's ready on your end.
@sanchit-gandhi, thank you for your feedback and code review! Since the proposed changes seemed easy to address I decided to continue with the PR.
The "model on the hub" approach also seems great. I went through the tutorial and found that actually I did something similar to push my weights to the hub (by loading them to an AST model and using the push_to_hub()
method). While now I understand how to push a custom modeling class, it was not very clear to me how to extend this functionality for a custom feature extractor. Would you have any examples?
Hey @palonso - nice that you found the "model on the hub" tutorial useful! The process for pushing a custom feature extractor is more or less the same as that for the model:
SequenceFeatureExtractor
into a python file feature_extraction_maest.py
(as you have done in your MAEST PR to transformers
) "AutoFeatureExtractor"
this time, not "AutoModel"
)Let me know if you encounter any difficulties and I'd be more than happy to go deeper on the above points!
@sanchit-gandhi, thank you very much!! This approach worked like a charm and it's already possible to use my models within Transformers 🥳.
There are still a few questions that I would like to ask:
Compute()
functionality fails because the trust_remote_code=True
flag was not set. I understand that automatically running custom code on your servers implies security concerns, but maybe there could be a way to manually trust certain custom classes (for example, when defined on repositories owned by trusted institutions).While the "model on the hub" tutorial was very clear to me, it was not direct that you could use push_to_hub()
on the feature extraction classes. Do you want me to extend the documentation mentioning this in a separate PR?
While the model on the hub solution works, I still see a few advantages to the integration via PR: a) people could run my model without needing to set trust_remote_code=True
, and b) it could have slightly better documentation only by the model cards. I don't want to overload you in case this requires much additional work from your side but, otherwise, do you think it would make sense to merge my PR? I'll close it otherwise.
Super cool! Very happy to hear you were able to push the feature extractor and that the model now works in Transformers 🙌 Answering your questions in-line below:
trust_remote_code=True
trust_remote_code=True
has been used extremely heavily for recent LLM releases, including the Falcon model, so I think it's a direction users are getting more familiar with. Also sharing a stat that the model cards on the Hub are getting between 5-10x more views than the documentation in Transformers, so again there's a movement to prioritising the Hub over library-specific documentation. So it's really a case that you can have an effective integration by populating your model card well on the Hub and leveraging trust_remote_code=True
! If you feel there's still benefit for a Transformers integration, happy to discuss, but IMO you can get all the benefits now with your Hub integration and skip the hassle of multiple PR reviews etc!We don't have support for inference widget for custom code due to security reasons.
Hey @palonso! Sorry for the radio silence here! How would you like to proceed? Are you happy with the current "remote code" integration of MAEST with Transformers and the HF Hub? It look like you've already got a nice integration here: https://huggingface.co/mtg-upf/discogs-maest-30s-pw-129e
Hi @sanchit-gandhi! We got some feedback from people using our models and also noticed that maest-30s-pw-73e-ts is particularly successful, so we are very happy in that sense!
Regarding the integration, we think that still it would be amazing to use our models from the Inference API online. According to @osanseviero, that's not possible with the current setup, so I would like to ask if there is something we could do from our side to facilitate merging our PR to Transformers. Otherwise, if it is hard to fit this in your roadmap, we are happy to leave it as it is and close this issue.
Hey @palonso! That's super cool to hear, congrats to you and your team on the release 🎉 If the Hub integration is working well it might be most time efficient to leave it as is? If we want to do something similar to the Inference API, we could create a simple Gradio demo for the model, and allow users to pass requests over Gradio Client? E.g. on this Space, there is a button at the bottom that says "Use via API". This provides a means of pinging the model hosted on the Space with requests, even if the model is not on the Inference API. WDYT?
Model description
Hi:)
The Music Audio Efficient Spectrogram Transformer (MAEST) features the same architecture as AST (or ViT) and is specifically trained for music applications. MAEST features a head performing music style classification with 400 classes, and, according to the original research, its representations are useful in a wide range of downstream music analysis tasks.
The original code and weights are available online, and the work was accepted for the 2023 ISMIR conference (post-print here).
The main difference with AST is on the specifics of the mel-spectrogram implementation. Because of this, I propose creating a new model that copies the AST architecture and defines a custom FeatureExtractor.
In the following days, I'll create a pull request adding this model according to the documentation. Please, let me know if this plan seems good or if there is something else I should consider.
Open source status
Provide useful links for the implementation
Original implementation and weights available here made by me (@palonso).