Open ghost opened 3 years ago
https://github.com/yygle/restore_bert_ckpt_to_performer maybe you can try to do it in this way, and the speed of training is not obviously better with a short sequence, as the author said, it will not gain dramatically when sequence is short.
@yygle Thank you. I looked at the code, but don't know where the modules coming from
from performer_pytorch.wrappers import ClassificationWrapper
from performer_pytorch_v2 import PerformerLM
you can briefly change the "from performer_pytorch_v2 import PerformerLM" to the performer model by the author lucidrains, which I think the class is defined in the python file.
Hey guys — @yygle and @Lincoln-Jiang
I have put together a working experiment to finetune from a pretrained model checkpoint.
The example loads the weights of the pretrained distilgpt2 from Hugging Face Transformers and then finetunes it on Wikitext 2 like in one of their examples.
Based on initial tests perplexity is not yet on part with the vanilla implementation which is ~5 vs ~3. I would appreciate any feedback and PRs to get the perplexity as close to the original.
https://github.com/tenexcoder/huggingface-tutorials/tree/main/performer
Hi, thank you for the excellent work.
After reading the Readme file, I am still not clear about how to use it to my existing BERT model. Can you please provide a detailed example of using it?
Thank you very much.