A Concrete Example of Use Performer-Pytorch into other Model checkpoint?

lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

MIT License

1.08k stars 141 forks source link

A Concrete Example of Use Performer-Pytorch into other Model checkpoint? #24

Open ghost opened 3 years ago

ghost commented 3 years ago

Hi, thank you for the excellent work.

After reading the Readme file, I am still not clear about how to use it to my existing BERT model. Can you please provide a detailed example of using it?

Thank you very much.

yygle commented 3 years ago

https://github.com/yygle/restore_bert_ckpt_to_performer maybe you can try to do it in this way, and the speed of training is not obviously better with a short sequence, as the author said, it will not gain dramatically when sequence is short.

ghost commented 3 years ago

@yygle Thank you. I looked at the code, but don't know where the modules coming from

from performer_pytorch.wrappers import ClassificationWrapper
from performer_pytorch_v2 import PerformerLM

yygle commented 3 years ago

you can briefly change the "from performer_pytorch_v2 import PerformerLM" to the performer model by the author lucidrains, which I think the class is defined in the python file.

tenexcoder commented 3 years ago

Hey guys — @yygle and @Lincoln-Jiang

I have put together a working experiment to finetune from a pretrained model checkpoint.

The example loads the weights of the pretrained distilgpt2 from Hugging Face Transformers and then finetunes it on Wikitext 2 like in one of their examples.

Based on initial tests perplexity is not yet on part with the vanilla implementation which is ~5 vs ~3. I would appreciate any feedback and PRs to get the perplexity as close to the original.

https://github.com/tenexcoder/huggingface-tutorials/tree/main/performer