huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.62k stars 27.15k forks source link

How to use hugginface for training: google-t5/t5-base #33232

Open gg22mm opened 2 months ago

gg22mm commented 2 months ago

Feature request

How to use hugginface for training / 如何使用huggingface来训练: https://github.com/huggingface/transformers/tree/main/examples/pytorch/translation

What is the format and how do I write it? / 这个格式是怎么样的,怎么写呢?

def batch_collator(data): print(data) #?????????????????????????????????????????????
return { 'pixel_values': torch.stack([x for x in pixel_values]), 'labels': torch.tensor([x for x in labels]) }

trainer = Trainer( model=model, args=training_args, data_collator=batch_collator,//这个需要怎么写? train_dataset=dataset['train'],
)

Motivation

Your contribution

我已经试了可以用: https://www.kaggle.com/code/weililong/google-t5-t5-base 不知道有没有什么坑

nbroad1881 commented 2 months ago

You can follow the guide here. Even though it says summarization, you can treat it like translation.

Summarization goes from article --> summary Translation goes from start language --> end language