huggingface / optimum

πŸš€ Accelerate training and inference of πŸ€— Transformers and πŸ€— Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.46k stars 439 forks source link

Community contribution - `BetterTransformer` integration for more models! #488

Open younesbelkada opened 1 year ago

younesbelkada commented 1 year ago

BetterTransformer integration for more models!

BetterTransformer API provides faster inference on CPU & GPU through a simple interface!

Models can benefit from very interesting speedups using a one liner and by making sure to install the latest version of PyTorch. A complete guideline on how to convert a new model has been created on the BetterTransformer documentation!

Here is a list of models that could be potentially supported, pick one of the architecture below and let's discuss about the conversion!

Text models πŸ–ŠοΈ :

Vision models πŸ“· :

Audio models πŸ”‰ :

Let us also know if you think that some architectures can be supported that we missed. Note that for encoder-decoder based models below, we expect to convert the encoder only.

Support for decoder-based models coming soon!

cc @michaelbenayoun @fxmarty

https://github.com/huggingface/transformers/issues/20372

Sumanth077 commented 1 year ago

Hi @younesbelkada would love to contribute to this Issue and can work on FSMT.

younesbelkada commented 1 year ago

Hey @Sumanth077 , thanks a bunch for your interest in this issue! πŸš€ Would love to assist you for the integration and let's try to make this happen! I have updated the table above, and attaching you the contribution tutorial here ;) Would you mind forking this repo and start opening a draft pull request so that I can start guiding you there? Also please do not hesitate to ping us here for any issue you are facing for the integration πŸ’ͺ

Sumanth077 commented 1 year ago

Thankyou for the reply @younesbelkada. Just opened a Draft Pull Request, haven't made any significant changes.

In the Step 1: Identifying the source layer to change and in the BETTER_TRANFORMER_LAYERS_MAPPING_DICT, I couldn't find a mapping between the Module for the FSMT that can be converted to its BetterTransformer equivalent.

Should I start creating that. Would love your assistance

younesbelkada commented 1 year ago

Hi @Sumanth077 , I have just replied on your PR, let's continue the discussion there ;)

ka00ri commented 1 year ago

Hi, I would like to contribute as well. This would be my first contribution to open source, so I might need some hand holding 🀚

I followed the documentation and the progress made on FSMT in huggingface/optimum#494 to better understand the task.

I looked into ViLT via

model = AutoModel.from_pretrained("dandelin/vilt-b32-mlm")

and as I understand the documentation, this should be the source layer to make changes to, including its attributes:

(0): ViltLayer( (attention): ViltAttention( (attention): ViltSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (output): ViltSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) ) (intermediate): ViltIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): ViltOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (layernorm_before): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (layernorm_after): LayerNorm((768,), eps=1e-12, elementwise_affine=True) )

I could give the ViLTLayer a go, if it's ok with you @younesbelkada πŸ™‚

younesbelkada commented 1 year ago

Hi @ka00ri ! Thanks a lot for your message and interest in contributing! Would love to assist you for integrating ViLT into BetterTransformer πŸ’ͺ That is correct, this layer has to be the source layer to change! Would you mind opening a PR and tag us (myself, @michaelbenayoun & @fxmarty ) ? Thanks a bunch!

adit299 commented 1 year ago

Hello, apologies for the delay, but I just opened up a draft PR to start discussion on how to add Better Transformer support for the ProphetNet encoder layer. I had a couple of questions about how to do this, so I was wondering who would would be the best person to reach out to regarding this. @michaelbenayoun @fxmarty @younesbelkada

fxmarty commented 1 year ago

Hi @adit299 , thanks for adding the support for this architecture! Feel free to ask any question in the PR you opened.

JanFidor commented 1 year ago

Hi @younesbelkada, could I pick up the RoFormer?

soma2000-lang commented 1 year ago

@younesbelkada doing Detr - DetrLayer

younesbelkada commented 1 year ago

Hello @JanFidor Yes sure! @soma2000-lang perfect, let us know when you open a PR πŸ’ͺ !

JanFidor commented 1 year ago

@younesbelkada Hi, thanks for responding, I'm not 100% certain, but I think RemBert, RoFormer and RocBert are already implemented, as they're already added to init.py, overview.mdx and the test_file, if that's the case, the list of models left to implement would need to be updated, let me know if you agree!

younesbelkada commented 1 year ago

I see, thanks for clarifying. I will double check that and let you know

younesbelkada commented 1 year ago

Thanks for letting me know! Indeed these are already implemented I can propose you to add BetterTransformer support for Blip (updated the table above)

JanFidor commented 1 year ago

Thanks for the suggestion, I'll get on it!

ravenouse commented 1 year ago

Hi @fxmarty and @younesbelkada !

Thank you so much for your previous help and support on my implementation of MBart support for BetterTransformer.

I want to follow up on my PR on ASTLayer support for BetterTransformer.

Specifically, I would like to check with you if it is still possible to work on this and have it reviewed and merged into the package. If it is, I would be happy to continue working on it.

I realized the whole BetterTransformer part and its testing have changed a lot in last several months. Once I get confirmed, I will start to edit my code accordingly to meet previous changes.

Thank you so much for your time and help, and I look forward to hearing back from you soon.

Sincerely,

rajveer43 commented 1 year ago

@younesbelkada I would like to work upon flavalayer can you confirm whether it is done or not?

mszsorondo commented 1 year ago

Hi! @JanFidor will you finish with BLIP? I can do it if not, with the permission of @younesbelkada @fxmarty

rajveer43 commented 1 year ago

@younesbelkada I would like to work upon flavalayer can you confirm whether it is done or not?

fxmarty commented 1 year ago

Hi,

@mszsorondo Looking into the PRs, BLIP has been implemented in https://github.com/huggingface/optimum/pull/1125. I just ticked it in the first post. @rajveer43 For Flava, there is this onging PR: https://github.com/huggingface/optimum/pull/907

rajveer43 commented 1 year ago

@fxmarty any other model available for work?

mszsorondo commented 1 year ago

@fxmarty same here, if thereΒ΄s still any model

hackpk commented 10 months ago

@younesbelkada Can, I work on ASTLayer??

karandua2016 commented 8 months ago

Any plans to add support for MPT?

qingfengcss commented 1 month ago

please support florence2!!!