'MPS' Issue Running HuggingFace Transformer Pix2Struct Model

🐛 Describe the bug

I am running the transformer model Pix2StructForConditionalGeneration, Pix2StructProcessor on MacOS 13.4 on an iMac 27" 2020 with an AMD Radeon Pro 5700XT.

The code to run the Transformer is here:

https://huggingface.co/google/pix2struct-ai2d-base

The code runs with 'CPU' but with 'MPS' gets:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ <ipython-input-5-7c03e79f5f42>:18 in <module>                                │
│                                                                              │
│ /Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.10/site-packages/tra │
│ nsformers/models/pix2struct/processing_pix2struct.py:156 in decode           │
│                                                                              │
│   153 │   │   This method forwards all its arguments to Pix2StructTokenizerF │
│   154 │   │   refer to the docstring of this method for more information.    │
│   155 │   │   """                                                            │
│ ❱ 156 │   │   return self.tokenizer.decode(*args, **kwargs)                  │
│   157 │                                                                      │
│   158 │   @property                                                          │
│   159 │   def model_input_names(self):                                       │
│                                                                              │
│ /Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.10/site-packages/tra │
│ nsformers/tokenization_utils_base.py:3485 in decode                          │
│                                                                              │
│   3482 │   │   # Convert inputs to python lists                              │
│   3483 │   │   token_ids = to_py_obj(token_ids)                              │
│   3484 │   │                                                                 │
│ ❱ 3485 │   │   return self._decode(                                          │
│   3486 │   │   │   token_ids=token_ids,                                      │
│   3487 │   │   │   skip_special_tokens=skip_special_tokens,                  │
│   3488 │   │   │   clean_up_tokenization_spaces=clean_up_tokenization_spaces │
│                                                                              │
│ /Users/davidlaxer/anaconda3/envs/AI-Feynman/lib/python3.10/site-packages/tra │
│ nsformers/tokenization_utils_fast.py:549 in _decode                          │
│                                                                              │
│   546 │   │                                                                  │
│   547 │   │   if isinstance(token_ids, int):                                 │
│   548 │   │   │   token_ids = [token_ids]                                    │
│ ❱ 549 │   │   text = self._tokenizer.decode(token_ids, skip_special_tokens=s │
│   550 │   │                                                                  │
│   551 │   │   clean_up_tokenization_spaces = (                               │
│   552 │   │   │   clean_up_tokenization_spaces                               │
╰──────────────────────────────────────────────────────────────────────────────╯
OverflowError: out of range integral type conversion attempted

Here's the code:

import requests
from PIL import Image
from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor

image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)

model = Pix2StructForConditionalGeneration.from_pretrained("google/pix2struct-ai2d-base").to("mps")
processor = Pix2StructProcessor.from_pretrained("google/pix2struct-ai2d-base")

question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"

inputs = processor(images=image, text=question, return_tensors="pt").to("mps")

predictions = model.generate(**inputs)
print(processor.decode(predictions[0], skip_special_tokens=True))

Versions

% python collect_env.py
Collecting environment information...
PyTorch version: 2.1.0.dev20230428
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.4 (x86_64)
GCC version: Could not collect
Clang version: 14.0.6
CMake version: version 3.22.1
Libc version: N/A

Python version: 3.10.11 (main, Apr 20 2023, 13:59:00) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz

Versions of relevant libraries:
[pip3] audiolm-pytorch==0.0.1
[pip3] configmypy==0.1.0
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.24.3
[pip3] pytorch-transformers==1.1.0
[pip3] tensorly-torch==0.4.0
[pip3] torch==1.14.0a0+git1c8b077
[pip3] torch-struct==0.5
[pip3] torch-summary==1.4.5
[pip3] torch-utils==0.1.2
[pip3] torchaudio==2.1.0.dev20230428
[pip3] torchtraining-nightly==1604016577
[pip3] torchvision==0.16.0.dev20230428
[pip3] vector-quantize-pytorch==0.9.2
[conda] nomkl                     3.0                           0  
[conda] numpy                     1.24.3          py310he50c29a_0  
[conda] numpy-base                1.24.3          py310h992e150_0  
[conda] pytorch-transformers      1.1.0                    pypi_0    pypi
[conda] tensorly-torch            0.4.0                    pypi_0    pypi
[conda] torch                     2.0.0.dev20230211          pypi_0    pypi
[conda] torch-struct              0.5                      pypi_0    pypi
[conda] torch-summary             1.4.5                    pypi_0    pypi
[conda] torch-utils               0.1.2                    pypi_0    pypi
[conda] torchaudio                2.1.0.dev20230428          pypi_0    pypi
[conda] torchtraining-nightly     1604016577               pypi_0    pypi
[conda] torchvision               0.16.0.dev20230428          pypi_0    pypi
[conda] vector-quantize-pytorch   0.9.2                    pypi_0    pypi

cc @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev

pytorch / pytorch

'MPS' Issue Running HuggingFace Transformer Pix2Struct Model #103966

🐛 Describe the bug

Versions