May i just train a translation task with T5 from scratch without pretrain a language model?

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

133.79k stars 26.75k forks source link

May i just train a translation task with T5 from scratch without pretrain a language model? #17451

Closed 520jefferson closed 2 years ago

520jefferson commented 2 years ago

Feature request

May i use bpe in preprocessing and train a translation model from scatch without pretrain a language model? @patrickvonplaten

Motivation

I want to distill a big model to T5 model and the T5 vocab should be the same as big model.

Your contribution

I can verify the process.

patrickvonplaten commented 2 years ago

Hey @520jefferson,

I understand your question as whether it's possible to fine-tune T5 from scratch and to not use a pretrained checkpoint.

Yes, this should definitely be possible, but I wouldn't really recommend it given the power of transfer learning.

Here also some very nice explanation by @sgugger on how powerful transfer learning is that might be interesting: https://huggingface.co/course/chapter1/4?fw=pt#transfer-learning

Why not fine-tune a pretrained T5 model on translation?

520jefferson commented 2 years ago

Hey @patrickvonplaten

I want to distill a big model (pytorch version) to t5 model (considering the FasterTransformer Backend https://github.com/triton-inference-server/fastertransformer_backend has provide origin T5 (not t5.1) triton backend optimization , this reasoning optimizaiton will be conducive to carrying more online traffic) , why i don't use transformer as the student model because i haven't find the pytorch version transformer with reasoning optimization and combining with triton.

And the big model use bpe not sentencepiece, So the tokenizer should be load the bpe codes and the vocabs is differenct from the origin t5 model. Therefore i want to distill the big model to t5 model and use the vocab in the same time.

So I need to figure out two things: 1, whether the t5 can be train in dialogue without pretrain which treat the t5 like transformer without pretrain and i haven't find a relate case finetune from scratch. 2, how to set the tokenizer to just use bpe codes?

patrickvonplaten commented 2 years ago

Sorry I'm a bit lost here @520jefferson,

I don't fully understand what you want to do here, but I guess the target task is distillation? Should we maybe try to get help on the forum: https://discuss.huggingface.co/ for distillation?

520jefferson commented 2 years ago

@patrickvonplaten I just need to finetune t5 from scratch without pretrain, and the tokenizer can just load vocab.txt (not json) or merges.txt (bpe codes).

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

520jefferson commented 2 years ago

the tokenizer can be built by hand.