huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.98k stars 27k forks source link

No way around "Truncation was not explicitely activated..." error when using SingleSentenceClassificationProcessor. #7028

Closed codygunton closed 3 years ago

codygunton commented 4 years ago

Environment info

Who can help

@LysandreJik, @thomwolf

Information

Model I am using: BERT.

The problem arises when using:

The tasks I am working on is:

To reproduce

from transformers import *
processor = SingleSentenceClassificationProcessor()
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
processor.add_examples(["Thanks for cool stuff!"])
processor.get_features(tokenizer, max_length=3)
Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.

[InputFeatures(input_ids=[101, 4283, 102], attention_mask=[1, 1, 1], token_type_ids=None, label=0)]

Expected behavior

This is expected, but the problem is that there is no way to suppress that warning, because there is no way to pass truncation=True when tokenizer.encode is called within processor.get_features. Probably one should make truncation an argument to processor.get_features.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.