mistralai / mistral-inference

Official inference library for Mistral models
https://mistral.ai/
Apache License 2.0
9.76k stars 871 forks source link

[BUG: RuntimeError: Boolean value of Tensor with more than one value is ambiguous] #225

Open siwer opened 2 months ago

siwer commented 2 months ago

Python -VV

File "/opt/anaconda/envs/transformers/lib/python3.11/site-packages/mistral_inference/transformer.py", line 162, in forward_partial
    if self.vision_encoder is not None and images:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

Pip Freeze

aiohappyeyeballs @ file:///croot/aiohappyeyeballs_1725434011349/work
aiohttp @ file:///croot/aiohttp_1725527756643/work
aiosignal @ file:///tmp/build/80754af9/aiosignal_1637843061372/work
annotated-types==0.7.0
attrs @ file:///croot/attrs_1695717823297/work
Bottleneck @ file:///croot/bottleneck_1707864210935/work
Brotli @ file:///work/ci_py311/brotli-split_1676830125088/work
certifi @ file:///croot/certifi_1725551672989/work/certifi
cffi @ file:///croot/cffi_1700254295673/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
cryptography @ file:///croot/cryptography_1702070282333/work
datasets @ file:///croot/datasets_1716911606380/work
dill @ file:///croot/dill_1715094664823/work
docstring_parser==0.16
filelock @ file:///croot/filelock_1700591183607/work
fire==0.6.0
frozenlist @ file:///croot/frozenlist_1698702560391/work
fsspec @ file:///croot/fsspec_1714461537038/work
gmpy2 @ file:///work/ci_py311/gmpy2_1676839849213/work
huggingface_hub @ file:///croot/huggingface_hub_1724853938404/work
idna @ file:///work/ci_py311/idna_1676822698822/work
Jinja2 @ file:///work/ci_py311/jinja2_1676823587943/work
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
MarkupSafe @ file:///croot/markupsafe_1704205993651/work
mistral_common==1.4.3
mistral_inference==1.4.0
mkl-fft @ file:///croot/mkl_fft_1695058164594/work
mkl-random @ file:///croot/mkl_random_1695059800811/work
mkl-service==2.4.0
mpmath @ file:///croot/mpmath_1690848262763/work
multidict @ file:///croot/multidict_1701096859099/work
multiprocess @ file:///croot/multiprocess_1692294385131/work
networkx @ file:///croot/networkx_1690561992265/work
numexpr @ file:///croot/numexpr_1696515281613/work
numpy @ file:///croot/numpy_and_numpy_base_1704311704800/work/dist/numpy-1.26.3-cp311-cp311-linux_x86_64.whl#sha256=10a078151ecec16bafb535f7487635217625fa06536dec8509e514648c78d626
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.68
nvidia-nvtx-cu12==12.1.105
opencv-python-headless==4.10.0.84
packaging @ file:///croot/packaging_1720101850331/work
pandas @ file:///croot/pandas_1718308974269/work/dist/pandas-2.2.2-cp311-cp311-linux_x86_64.whl#sha256=3c7ce50f9f519c785bd4cdb28a0ca71f85a541f3d27b25aa9da770f953e7f2e9
pillow==10.4.0
pyarrow @ file:///croot/pyarrow_1721664224170/work/python
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydantic==2.9.2
pydantic_core==2.23.4
pyOpenSSL @ file:///croot/pyopenssl_1690223430423/work
PySocks @ file:///work/ci_py311/pysocks_1676822712504/work
python-dateutil @ file:///croot/python-dateutil_1716495738603/work
pytz @ file:///croot/pytz_1713974312559/work
PyYAML @ file:///croot/pyyaml_1698096049011/work
referencing==0.35.1
regex @ file:///croot/regex_1723064389032/work
requests @ file:///croot/requests_1690400202158/work
rpds-py==0.20.0
safetensors @ file:///croot/safetensors_1724853960118/work
sentencepiece==0.2.0
simple-parsing==0.1.6
six @ file:///tmp/build/80754af9/six_1644875935023/work
sympy @ file:///croot/sympy_1701397643339/work
termcolor==2.4.0
tiktoken==0.7.0
tokenizers @ file:///croot/tokenizers_1721139552427/work
torch==2.4.1
torchaudio==2.1.2
torchvision==0.16.2
tqdm @ file:///croot/tqdm_1724853939799/work
transformers @ file:///home/conda/feedstock_root/build_artifacts/transformers_1724403320167/work
triton==3.0.0
typing_extensions @ file:///croot/typing_extensions_1715268824938/work
tzdata @ file:///croot/python-tzdata_1690578112552/work
urllib3 @ file:///croot/urllib3_1698257533958/work
xformers==0.0.28.post1
xxhash @ file:///work/ci_py311/python-xxhash_1676842384694/work
yarl @ file:///croot/yarl_1725976495189/work

Reproduction Steps

Running forward_partial() with Pixtral led to the above mentioned issue. See the code below for my script

import torch
from pathlib import Path
from mistral_inference.transformer import Transformer
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage, TextChunk, ImageURLChunk
from mistral_common.protocol.instruct.request import ChatCompletionRequest

mistral_models_path = Path.home().joinpath('mistral_models', 'Pixtral')
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
model = Transformer.from_folder(mistral_models_path,device="cuda:0")

# Run the model 
url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
prompt = "Describe the image."

completion_request = ChatCompletionRequest(messages=[UserMessage(content=[ImageURLChunk(image_url=url), TextChunk(text=prompt)])])

encoded = tokenizer.encode_chat_completion(completion_request)

images = encoded.images
tokens = encoded.tokens

tokens = torch.tensor(tokens).to(model.device)
images = torch.cuda.BFloat16Tensor(images).to(model.device)

with torch.no_grad():
    res = model.forward_partial(input_ids=tokens,seqlens=[len(tokens)],images=images)

Expected Behavior

Expected model.forward_partial() to output the vector representations of the input tokens

Additional Context

No response

Suggested Solutions

Change line 162 in mistral-inference/blob/main/src/mistral_inference/transformer.py

current: if self.vision_encoder is not None and images:

proposed solution: if self.vision_encoder is not None and images is not None:

This led to the code to functioning properly