Community contribution - `BetterTransformer` integration for more models!

younesbelkada commented 1 year ago

`BetterTransformer` integration for more models!

BetterTransformer API provides faster inference on CPU & GPU through a simple interface!

Models can benefit from very interesting speedups using a one liner and by making sure to install the latest version of PyTorch. A complete guideline on how to convert a new model has been created on the BetterTransformer documentation!

Here is a list of models that could be potentially supported, pick one of the architecture below and let's discuss about the conversion!

Text models 🖊️ :

[x] FSMT - FSMTEncoderLayer / @Sumanth077 https://github.com/huggingface/optimum/pull/494
[ ] MobileBERT - MobileBertLayer / @raghavanone https://github.com/huggingface/optimum/pull/506
[x] MBart - MBartEncoderLayer + M2M100EncoderLayer / https://github.com/huggingface/optimum/pull/516 @ravenouse
[x] ProphetNet - ProphetNetEncoderLayer
[x] RemBert - RemBertLayer
[x] RocBert - RocBertLayer
[x] RoFormer - RoFormerLayer
[x] Tapas - TapasLayer / https://github.com/huggingface/optimum/pull/520

Vision models 📷 :

[x] Blip - BlipLayer
[ ] Detr - DetrLayer
[ ] Flava - FlavaLayer
[ ] GLPN - GLPNLayer | Cannot be supported
[x] ViLT - ViLTLayer / https://github.com/huggingface/optimum/pull/508

Audio models 🔉 :

[ ] Speech2Text - Speech2TextLayer
[ ] NEW: Audio Speech Transformer - ASTLayer

Let us also know if you think that some architectures can be supported that we missed. Note that for encoder-decoder based models below, we expect to convert the encoder only.

Support for decoder-based models coming soon!

cc @michaelbenayoun @fxmarty

https://github.com/huggingface/transformers/issues/20372

Sumanth077 commented 1 year ago

Hi @younesbelkada would love to contribute to this Issue and can work on FSMT.

younesbelkada commented 1 year ago

Hey @Sumanth077 , thanks a bunch for your interest in this issue! 🚀 Would love to assist you for the integration and let's try to make this happen! I have updated the table above, and attaching you the contribution tutorial here ;) Would you mind forking this repo and start opening a draft pull request so that I can start guiding you there? Also please do not hesitate to ping us here for any issue you are facing for the integration 💪

Sumanth077 commented 1 year ago

Thankyou for the reply @younesbelkada. Just opened a Draft Pull Request, haven't made any significant changes.

In the Step 1: Identifying the source layer to change and in the BETTER_TRANFORMER_LAYERS_MAPPING_DICT, I couldn't find a mapping between the Module for the FSMT that can be converted to its BetterTransformer equivalent.

Should I start creating that. Would love your assistance

younesbelkada commented 1 year ago

Hi @Sumanth077 , I have just replied on your PR, let's continue the discussion there ;)

ka00ri commented 1 year ago

Hi, I would like to contribute as well. This would be my first contribution to open source, so I might need some hand holding 🤚

I followed the documentation and the progress made on FSMT in huggingface/optimum#494 to better understand the task.

I looked into ViLT via

model = AutoModel.from_pretrained("dandelin/vilt-b32-mlm")

and as I understand the documentation, this should be the source layer to make changes to, including its attributes:

(0): ViltLayer( (attention): ViltAttention( (attention): ViltSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (output): ViltSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) ) (intermediate): ViltIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): ViltOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (layernorm_before): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (layernorm_after): LayerNorm((768,), eps=1e-12, elementwise_affine=True) )

I could give the ViLTLayer a go, if it's ok with you @younesbelkada 🙂

younesbelkada commented 1 year ago

Hi @ka00ri ! Thanks a lot for your message and interest in contributing! Would love to assist you for integrating ViLT into BetterTransformer 💪 That is correct, this layer has to be the source layer to change! Would you mind opening a PR and tag us (myself, @michaelbenayoun & @fxmarty ) ? Thanks a bunch!

adit299 commented 1 year ago

Hello, apologies for the delay, but I just opened up a draft PR to start discussion on how to add Better Transformer support for the ProphetNet encoder layer. I had a couple of questions about how to do this, so I was wondering who would would be the best person to reach out to regarding this. @michaelbenayoun @fxmarty @younesbelkada

fxmarty commented 1 year ago

Hi @adit299 , thanks for adding the support for this architecture! Feel free to ask any question in the PR you opened.

JanFidor commented 1 year ago

Hi @younesbelkada, could I pick up the RoFormer?

soma2000-lang commented 1 year ago

@younesbelkada doing Detr - DetrLayer

younesbelkada commented 1 year ago

Hello @JanFidor Yes sure! @soma2000-lang perfect, let us know when you open a PR 💪 !

JanFidor commented 1 year ago

@younesbelkada Hi, thanks for responding, I'm not 100% certain, but I think RemBert, RoFormer and RocBert are already implemented, as they're already added to init.py, overview.mdx and the test_file, if that's the case, the list of models left to implement would need to be updated, let me know if you agree!

younesbelkada commented 1 year ago

I see, thanks for clarifying. I will double check that and let you know

younesbelkada commented 1 year ago

Thanks for letting me know! Indeed these are already implemented I can propose you to add BetterTransformer support for Blip (updated the table above)

JanFidor commented 1 year ago

Thanks for the suggestion, I'll get on it!

ravenouse commented 1 year ago

Hi @fxmarty and @younesbelkada !

Thank you so much for your previous help and support on my implementation of MBart support for BetterTransformer.

I want to follow up on my PR on ASTLayer support for BetterTransformer.

Specifically, I would like to check with you if it is still possible to work on this and have it reviewed and merged into the package. If it is, I would be happy to continue working on it.

I realized the whole BetterTransformer part and its testing have changed a lot in last several months. Once I get confirmed, I will start to edit my code accordingly to meet previous changes.

Thank you so much for your time and help, and I look forward to hearing back from you soon.

Sincerely,

rajveer43 commented 1 year ago

@younesbelkada I would like to work upon flavalayer can you confirm whether it is done or not?

mszsorondo commented 1 year ago

Hi! @JanFidor will you finish with BLIP? I can do it if not, with the permission of @younesbelkada @fxmarty

rajveer43 commented 1 year ago

@younesbelkada I would like to work upon flavalayer can you confirm whether it is done or not?

fxmarty commented 1 year ago

Hi,

@mszsorondo Looking into the PRs, BLIP has been implemented in https://github.com/huggingface/optimum/pull/1125. I just ticked it in the first post. @rajveer43 For Flava, there is this onging PR: https://github.com/huggingface/optimum/pull/907

rajveer43 commented 1 year ago

@fxmarty any other model available for work?

mszsorondo commented 1 year ago

@fxmarty same here, if there´s still any model

hackpk commented 10 months ago

@younesbelkada Can, I work on ASTLayer??

karandua2016 commented 8 months ago

Any plans to add support for MPT?

qingfengcss commented 1 month ago

please support florence2!!!

huggingface / optimum