huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

128.63k stars 25.52k forks source link

Edgeformer #16410

Open patrickvonplaten opened 2 years ago

patrickvonplaten commented 2 years ago

🌟 New model addition

Model description

EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation. Tao Ge and Furu Wei

March 2022: release code and pretrained checkpoints.

Open source status

[x] the model implementation is available: https://github.com/microsoft/unilm/blob/900f5416c8137a753b1c8f53cd5015d0ceca7061/edgelm/fairseq/models/transformer/transformer_legacy.py#L226
[x] the model weights are available: https://github.com/microsoft/unilm/tree/900f5416c8137a753b1c8f53cd5015d0ceca7061/edgelm#pretrained-models
[x] who are the authors: Maybe @gitnlp

Happy to help with a model contribution here!

reichenbch commented 2 years ago

@patrickvonplaten If I can work on this and contribute, do let me know. Meanwhile, I will proceed to read and understand the paper.

patil-suraj commented 2 years ago

Hey @reichenbch feel free to work on this if you are interested. Patrick is on vacation this week so I would be happy to help with this :)

reichenbch commented 2 years ago

@patil-suraj I was thinking first to read the paper once and then look into the available implementation (they are using fairseq library) and checkpoints. Is that the correct approach ? Secondly, what all things would I need for this ? Any model creation guide available ? I know model templates are available in the repo.

patil-suraj commented 2 years ago

For implementing the model I would suggest a code-first approach. My approach is to

First setup the original code base
Load model and be able to do inference and inspect the outputs, intermediate values.
Add the modeling code for transformers
Convert the weights
Do forward pass using both the original model and transformers model, compare if the outputs match, if not then debug and iterate.

Here are some docs that might help when adding a new model :slightly_smiling_face:

Most seq2seq models in fairseq are similar to bart/mbart, so I would suggest to refer to those models and use the transformers-cli add-new-model-like command which can create a bart/mbart like template.

Hope this helps!

patil-suraj commented 2 years ago

Hey @reichenbch how's it going ? Let us know if you need any help :)

reichenbch commented 2 years ago

Hey @patil-suraj Work is in progress, I had the misfortune to witness some health issues sometime back. I will update the files and try to get back on track

patil-suraj commented 2 years ago

Hope you are feeling okay now! And no rush, just wanted to check-in. Take your time 🤗

inderpreetsingh01 commented 2 years ago

Hey @patrickvonplaten @patil-suraj @reichenbch i am interested to work on this issue, let me know if I can contribute.

patrickvonplaten commented 2 years ago

@inderpreetsingh01, feel free to open a PR if you want :-)

pramodith commented 1 year ago

@inderpreetsingh01 @patrickvonplaten is anyone actively working on this issue. I was wondering if either I could take it up or shadow someone working on it. I'd like to start learning how to contribute models to huggingface.

inderpreetsingh01 commented 1 year ago

@pramodith , I started working on it but got occupied with some personal things, I had gone through the resources shared what I understood from the paper:

EdgeFormer uses:

Layer adaptation
Interleaved transformer decoder layer
Load balanced and encoder favored parametrization
Each parameter is used a minimum 4 to a max of 6 times

I am not clear on the layer adaptation part and couldn’t find any parameter related to that in fairseq or edge_architecture function used to define the model.

Let me know if you want to discuss and work on it.

pramodith commented 1 year ago

@inderpreetsingh01 I believe that for the layer adaptation technique new parameters are only required for the LoRA method. The parameters for this are defined in this file in the fairseq repository. The file also contains the code for the Interleaved decoder.

If you're busy with other things I can definitely have a go at adding this model to the huggingface repo.

inderpreetsingh01 commented 1 year ago

@pramodith thanks for clearing it, i actually looked at the original fairseq repository which is not having the adaptation part. I can contribute on this, we can connect here

patrickvonplaten commented 1 year ago

Let me know if you need any help :-)

pramodith commented 1 year ago

Hey @patrickvonplaten, I wanted to start porting the edgeformer model into the transformers library so I used the transformers-cli add-new-model-like command, however one of the questions that follows is Please give a checkpoint identifier (on the model Hub) for this new model. does this mean that I need to upload the pretrained weights file to the Huggingface hub?

patrickvonplaten commented 1 year ago

Hey @pramodith,

It means that you should specific the checkpoint name that you intend to you when uploading the weights to the Hub.

atharvakavitkar commented 1 year ago

Hi @patrickvonplaten, @pramodith Is this issue still open? I would like to contribute but I don't see a related PR.