huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.93k stars 26.79k forks source link

(Willing to PR) Make `tokenizer.padding_side` an argument instead of only being a field #30447

Open fzyzcjy opened 6 months ago

fzyzcjy commented 6 months ago

Feature request

Hi thanks for the library! When using tokenizer, for example, for batch-generation with GPT2 (in https://discuss.huggingface.co/t/batch-generation-with-gpt2/1517), it seems that currently I have to do something like:

tokenizer.padding_side = 'left'
data = tokenizer(['sentence one', 'another'])
tokenizer.padding_side = 'right'

Therefore, it would be great to have:

data = tokenizer(['sentence one', 'another'], padding_side = 'left')

just like what we do today for many options like padding_strategy etc.

Motivation

(see above)

Your contribution

Yes, I am willing to PR

amyeroberts commented 6 months ago

cc @ArthurZucker

ArthurZucker commented 5 months ago

Sure, feel free to open a PR and ping @itazap 🤗