huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.23k stars 26.61k forks source link

Qwen2-VL Doesn't Execute on TPUs #33289

Open radna0 opened 1 month ago

radna0 commented 1 month ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

Following this Qwen2-VL guide => https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct#quickstart

  1. Script
    
    from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
    from qwen_vl_utils import process_vision_info
    import numpy as np
    import torch
    import torch_xla as xla
    import torch_xla.core.xla_model as xm
    import torch_xla.distributed.spmd as xs

from torch.distributed._tensor import DeviceMesh, distribute_module from torch_xla.distributed.spmd import auto_policy

from torch_xla import runtime as xr from torch_xla.experimental.spmd_fully_sharded_data_parallel import ( _prepare_spmd_partition_spec, SpmdFullyShardedDataParallel as FSDPv2, )

import time

start = time.time()

device = xm.xla_device()

default: Load the model on the available device(s)

model = Qwen2VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2-VL-2B-Instruct", torch_dtype=torch.bfloat16, attn_implementation="eager", ).to(device)

print(model.device)

default processer

processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4")

message = [ { "role": "user", "content": [ { "type": "image", "image": "image1.jpg", }, {"type": "text", "text": "Describe this image in detail."}, ], } ]

allmessages = [[message] for in range(1)] for messages in all_messages:

# Preparation for inference
texts = [
    processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
    for msg in messages
]

image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=texts,
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to(device)

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [
    out_ids[len(in_ids) :]
    for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)
for i, text in enumerate(output_text):
    print(f"Output {i}: {text}")

print(f"Time taken: {time.time() - start}")


2. Output Logs

kojoe@t1v-n-cb70f560-w-0:~/EasyAnimate/easyanimate/image_caption$ python caption.py WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU. Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 4.39it/s] xla:0



### Expected behavior

The model works fine when chaging ```device``` to ```"cpu"```, but stuck executing on TPUs. The model should run on TPUs
github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker commented 6 days ago

Hey! Don't think we officially test nor support TPU for this model 🤗 I can't really reproduce 😢 @tengomucho might have an idea

tengomucho commented 4 days ago

@radna0 transformers does not support officially TPU, but I think things might work if: