Closed 520jefferson closed 2 years ago
Hey @520jefferson,
I understand your question as whether it's possible to fine-tune T5 from scratch and to not use a pretrained checkpoint.
Yes, this should definitely be possible, but I wouldn't really recommend it given the power of transfer learning.
Here also some very nice explanation by @sgugger on how powerful transfer learning is that might be interesting: https://huggingface.co/course/chapter1/4?fw=pt#transfer-learning
Why not fine-tune a pretrained T5 model on translation?
Hey @patrickvonplaten
I want to distill a big model (pytorch version) to t5 model (considering the FasterTransformer Backend https://github.com/triton-inference-server/fastertransformer_backend has provide origin T5 (not t5.1) triton backend optimization , this reasoning optimizaiton will be conducive to carrying more online traffic) , why i don't use transformer as the student model because i haven't find the pytorch version transformer with reasoning optimization and combining with triton.
And the big model use bpe not sentencepiece, So the tokenizer should be load the bpe codes and the vocabs is differenct from the origin t5 model. Therefore i want to distill the big model to t5 model and use the vocab in the same time.
So I need to figure out two things: 1, whether the t5 can be train in dialogue without pretrain which treat the t5 like transformer without pretrain and i haven't find a relate case finetune from scratch. 2, how to set the tokenizer to just use bpe codes?
Sorry I'm a bit lost here @520jefferson,
I don't fully understand what you want to do here, but I guess the target task is distillation? Should we maybe try to get help on the forum: https://discuss.huggingface.co/ for distillation?
@patrickvonplaten I just need to finetune t5 from scratch without pretrain, and the tokenizer can just load vocab.txt (not json) or merges.txt (bpe codes).
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
the tokenizer can be built by hand.
Feature request
May i use bpe in preprocessing and train a translation model from scatch without pretrain a language model? @patrickvonplaten
Motivation
I want to distill a big model to T5 model and the T5 vocab should be the same as big model.
Your contribution
I can verify the process.