Inconsistency with transformer pipeline results

abhijeetk597 commented 7 months ago

System Info

transformers version: 4.38.2
Platform: Linux-5.15.133+-x86_64-with-glibc2.31
Python version: 3.10.13
Huggingface_hub version: 0.21.4
Safetensors version: 0.4.2
Accelerate version: 0.28.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cpu (False)
Tensorflow version (GPU?): 2.15.0 (False)
Flax version (CPU?/GPU?/TPU?): 0.8.2 (cpu)
Jax version: 0.4.25
JaxLib version: 0.4.25
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

sent_pipeline = pipeline("sentiment-analysis",model=model, tokenizer=tokenizer)

import random
random_text = random.choice(df["Text"])

print(random_text)
sent_pipeline("random_text")

Screenshot 2024-03-30 052904

I tried same thing with different model but got same kind of result.

Screenshot 2024-03-30 053110

Kaggle Notebook Link

Expected behavior

Sentiment scores should vary as per text given.

vasqu commented 7 months ago

You pass the string "random_text" but not the variable's (random_text) content. That's why you get the same score.

Try passing it like sent_pipeline(f'{random_text}'). So in total something like:

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

sent_pipeline = pipeline("sentiment-analysis",model=model, tokenizer=tokenizer)

import random
random_text = random.choice(df["Text"])

print(random_text)
# here is the change
sent_pipeline(f"{random_text}")

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers