huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.7k stars 26.44k forks source link

[Bug Fix] fix qa pipeline tensor to numpy #31585

Closed jiqing-feng closed 2 months ago

jiqing-feng commented 3 months ago

Hi @Narsil @amyeroberts

This PR fixed the error for question-answering pipeline, the error could be reproduced by

from transformers import pipeline
pipe = pipeline("question-answering", model="hf-internal-testing/tiny-random-bert")
question = "What's my name?"
context = "My Name is Sasha and I live in Lyon."
pipe(question, context)

Traceback:

Traceback (most recent call last):
  File "test_qa.py", line 5, in <module>
    pipe(question, context)
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 393, in __call__
    return super().__call__(examples[0], **kwargs)
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1235, in __call__
    return next(
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 546, in postprocess                                                    starts, ends, scores, min_null_score = select_starts_ends(
  File "/home/jiqingfe/miniconda3/envs/ccl/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 124, in select_starts_ends
    undesired_tokens = undesired_tokens & attention_mask
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
jiqing-feng commented 3 months ago

I found this problem came from numpy, in python3.8, numpy will cast int to float: image

So I suggest that we can use p_mask.numpy() instead of np.array(p_mask)

jiqing-feng commented 2 months ago

Hi @amyeroberts , could you take a look at this PR? I am waiting for your response, thx!

LysandreJik commented 2 months ago

Hey @jiqing-feng! I'm trying to reproduce the issue but failing at doing so with python 3.8.18 and numpy 1.24.4.

>>> import torch
>>> import numpy as np
>>> a = torch.tensor([1,2,3], dtype=torch.int64)
>>> a
tensor([1, 2, 3])
>>> np.array(a)
array([1, 2, 3])
>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=8, micro=18, releaselevel='final', serial=0)

What's your torch version?

jiqing-feng commented 2 months ago

Hey @jiqing-feng! I'm trying to reproduce the issue but failing at doing so with python 3.8.18 and numpy 1.24.4.

>>> import torch
>>> import numpy as np
>>> a = torch.tensor([1,2,3], dtype=torch.int64)
>>> a
tensor([1, 2, 3])
>>> np.array(a)
array([1, 2, 3])
>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=8, micro=18, releaselevel='final', serial=0)

What's your torch version?

torch 2.3.0+cpu

jiqing-feng commented 2 months ago

Hey @jiqing-feng! I'm trying to reproduce the issue but failing at doing so with python 3.8.18 and numpy 1.24.4.

>>> import torch
>>> import numpy as np
>>> a = torch.tensor([1,2,3], dtype=torch.int64)
>>> a
tensor([1, 2, 3])
>>> np.array(a)
array([1, 2, 3])
>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=8, micro=18, releaselevel='final', serial=0)

What's your torch version?

torch 2.3.0+cpu

I just checked that torch 2.3.1+cpu fixed this issue; you can close this PR if you think there is no need to do this change. BTW, I suppose the change will not break anything, and it's more common. Thx!

amyeroberts commented 2 months ago

@jiqing-feng Thanks for investigating across the different pytorch versions. If the fix it only in later versions, then this is a change we'd still want as we officially support torch >= 1.11