openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.22k stars 2.25k forks source link

[Bug]: `AttributeError: 'SymInt' object has no attribute 'size'` error when using `torch.compile` for LLaVA model #22412

Open notsyncing opened 9 months ago

notsyncing commented 9 months ago

OpenVINO Version

2023.3

Operating System

Fedora Silverblue 39

Device used for inference

GPU

Framework

PyTorch

Model used

llava-hf/llava-1.5-7b-hf

Issue description

Hello, I'm trying to use openvino with torch.compile to run inference of a LLaVA model with following code:

from PIL import Image

import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration, BatchFeature

import openvino.torch

model_id = "/mnt/external2/LLMs/llava-1.5-7b-hf"
processor = AutoProcessor.from_pretrained(model_id)

prompt = "<image>\n"
prompt += "USER: What are the things I should be cautious about when I visit this place?\nASSISTANT:"
image_file = "./view.jpg"

model = LlavaForConditionalGeneration.from_pretrained(model_id, low_cpu_mem_usage=True).eval()

print("Compiling...")
model.generate = torch.compile(model.generate, backend="openvino", 
    options = {"device": "GPU.1", "model_caching": True})

raw_image = Image.open(image_file)
inputs = processor(prompt, raw_image, return_tensors='pt')

print("Generating...")
output = model.generate(**inputs, max_new_tokens=200)
print("Decoding...")
print(processor.decode(output[0][2:], skip_special_tokens=True))

and it will print the following error:

[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG] TRACED GRAPH
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]  ===== __compiled_fn_35 =====
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]  <eval_with_key>.93 class GraphModule(torch.nn.Module):
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]     def forward(self, s0 : torch.SymInt, L_input_ids_ : torch.Tensor, s1 : torch.SymInt, L_attention_mask_ : torch.Tensor):
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         l_input_ids_ = L_input_ids_
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         l_attention_mask_ = L_attention_mask_
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: /mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/models/llava/modeling_llava.py:524, code: if attention_mask is not None and attention_mask.shape[1] > input_ids.shape[1]:
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         size = l_attention_mask_.size();  l_attention_mask_ = None
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         getitem_1 = size[1];  size = None
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         size_1 = l_input_ids_.size()
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         getitem_3 = size_1[1];  size_1 = None
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         gt = getitem_1 > getitem_3;  getitem_1 = getitem_3 = None
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: /mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/models/llava/modeling_llava.py:528, code: elif past_length < input_ids.shape[1]:
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         size_2 = l_input_ids_.size();  l_input_ids_ = None
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         getitem_5 = size_2[1];  size_2 = None
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         gt_1 = getitem_5 > 603;  getitem_5 = None
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         return ()
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG]         
[2024-01-25 16:10:26,531] [24/1] torch._dynamo.output_graph.__graph_code: [DEBUG] 
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] TRACED GRAPH
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG]  __compiled_fn_35 <eval_with_key>.93 opcode         name               target                       args                    kwargs
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] -------------  -----------------  ---------------------------  ----------------------  --------
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] placeholder    s0                 s0                           ()                      {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] placeholder    l_input_ids_       L_input_ids_                 ()                      {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] placeholder    s1                 s1                           ()                      {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] placeholder    l_attention_mask_  L_attention_mask_            ()                      {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] call_method    size               size                         (l_attention_mask_,)    {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] call_function  getitem_1          <built-in function getitem>  (size, 1)               {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] call_method    size_1             size                         (l_input_ids_,)         {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] call_function  getitem_3          <built-in function getitem>  (size_1, 1)             {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] call_function  gt                 <built-in function gt>       (getitem_1, getitem_3)  {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] call_method    size_2             size                         (l_input_ids_,)         {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] call_function  getitem_5          <built-in function getitem>  (size_2, 1)             {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] call_function  gt_1               <built-in function gt>       (getitem_5, 603)        {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] output         output             output                       ((),)                   {}
[2024-01-25 16:10:26,532] [24/1] torch._dynamo.output_graph.__graph: [DEBUG] 
[2024-01-25 16:10:26,534] [24/1] torch._dynamo.output_graph.__graph_sizes: [DEBUG] TRACED GRAPH TENSOR SIZES
[2024-01-25 16:10:26,534] [24/1] torch._dynamo.output_graph.__graph_sizes: [DEBUG] ===== __compiled_fn_35 =====
[2024-01-25 16:10:26,534] [24/1] torch._dynamo.output_graph.__graph_sizes: [DEBUG] l_input_ids_: (1, s0)
[2024-01-25 16:10:26,534] [24/1] torch._dynamo.output_graph.__graph_sizes: [DEBUG] l_input_ids_ (concrete): (1, 29)
[2024-01-25 16:10:26,534] [24/1] torch._dynamo.output_graph.__graph_sizes: [DEBUG] l_attention_mask_: (1, s1)
[2024-01-25 16:10:26,534] [24/1] torch._dynamo.output_graph.__graph_sizes: [DEBUG] l_attention_mask_ (concrete): (1, 29)
[2024-01-25 16:10:26,534] [24/1] torch._dynamo.output_graph.__graph_sizes: [DEBUG] 
[2024-01-25 16:10:26,534] [24/1] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function openvino
Compiling...
Generating...
Traceback (most recent call last):
  File "/var/mnt/data/podman/CogVLM/test2.py", line 36, in <module>
    output = model.generate(**inputs, max_new_tokens=200)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1173, in generate
    @torch.no_grad()
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1279, in <resume in generate>
    self._validate_model_class()
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1290, in <resume in generate>
    and self.generation_config._original_object_hash == hash(self.generation_config)
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1291, in <resume in generate>
    and self.config._has_non_default_generation_parameters()
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1304, in <resume in generate>
    generation_config = copy.deepcopy(generation_config)
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1307, in <resume in generate>
    self._validate_model_kwargs(model_kwargs.copy())
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1307, in <resume in generate>
    self._validate_model_kwargs(model_kwargs.copy())
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1310, in <resume in generate>
    logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1479, in <resume in generate>
    return self.greedy_search(
           ^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/generation/utils.py", line 2337, in greedy_search
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame
    result = inner_convert(frame, cache_size, hooks, frame_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert
    return _compile(
           ^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 491, in compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object
    transformations(instructions, code_options)
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 458, in transform
    tracer.run()
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2069, in run
    super().run()
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 719, in run
    and self.step()
        ^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 697, in step
    self.output.compile_subgraph(
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 857, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 957, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1024, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1009, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/__init__.py", line 1607, in __call__
    return self.compiler_fn(model_, inputs_, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/backends/common.py", line 95, in wrapper
    return fn(model, inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/openvino/frontend/pytorch/torchdynamo/backend.py", line 49, in openvino
    return fx_openvino(subgraph, example_inputs, options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/openvino/frontend/pytorch/torchdynamo/backend.py", line 156, in fx_openvino
    return compile_fx(subgraph, example_inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1150, in compile_fx
    return aot_autograd(
           ^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/backends/common.py", line 55, in compiler_fn
    cg = aot_module_simplified(gm, example_inputs, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 3891, in aot_module_simplified
    compiled_fn = create_aot_dispatcher_function(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 3379, in create_aot_dispatcher_function
    fw_metadata = run_functionalized_fw_and_collect_metadata(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 757, in inner
    flat_f_outs = f(*flat_f_args)
                  ^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 3496, in functional_call
    out = Interpreter(mod).run(*args[params_len:], **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/fx/interpreter.py", line 138, in run
    self.env[node] = self.run_node(node)
                     ^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/fx/interpreter.py", line 195, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/torch/fx/interpreter.py", line 289, in call_method
    return getattr(self_obj, target)(*args_tail, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
torch._dynamo.exc.BackendCompilerFailed: backend='openvino' raised:
AttributeError: 'SymInt' object has no attribute 'size'

While executing %size : [num_users=1] = call_method[target=size](args = (%l_attention_mask_,), kwargs = {})
Original traceback:
  File "/mnt/data/podman/.conda/envs/cogvlm/lib/python3.11/site-packages/transformers/models/llava/modeling_llava.py", line 524, in prepare_inputs_for_generation
    if attention_mask is not None and attention_mask.shape[1] > input_ids.shape[1]:

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

[2024-01-25 16:10:27,541] torch._dynamo.utils: [INFO] TorchDynamo compilation metrics:
[2024-01-25 16:10:27,541] torch._dynamo.utils: [INFO] Function                           Runtimes (s)
[2024-01-25 16:10:27,541] torch._dynamo.utils: [INFO] -------------------------------  --------------
[2024-01-25 16:10:27,541] torch._dynamo.utils: [INFO] _compile.<locals>.compile_inner         396.586
[2024-01-25 16:10:27,541] torch._dynamo.utils: [INFO] OutputGraph.call_user_compiler          303.429
[2024-01-25 16:10:27,541] torch._dynamo.utils: [INFO] create_aot_dispatcher_function  

software versions:

python 3.11.7 (conda)
openvino 2023.3.0
transformers 4.37.0
optimum 1.16.2
optimum_intel 1.12.4
torch 2.1.2
oneapi basekit 2024.0

hardware versions:

Intel Core i5-6500
Intel ARC A770 16GB
64GB RAM + 1T SWAP on SSD

Step-by-step reproduction

No response

Relevant log output

No response

Issue submission checklist

mvafin commented 9 months ago

Looks like issue with torch.compile itself, @cavusmustafa could you help here?

cavusmustafa commented 9 months ago

One of the torch dynamo partitions seem to be failing while handling the symbolic inputs. Needs more debugging to provide a sufficient fix.

andrei-kochin commented 8 months ago

@cavusmustafa any updates here?

cavusmustafa commented 8 months ago

We are planning to enable new LLM features with the next release. As part of the updates, we are working on a fix for this issue as well.

anzr299 commented 2 months ago

@cavusmustafa are there any updates on this? I am facing the same issue when trying to compile tinyllama-1.1b-step-50k-105b with openvino backend.

avitial commented 3 weeks ago

Ref. 132028

avitial commented 3 weeks ago

@cavusmustafa are there any updates on this? I am facing the same issue when trying to compile tinyllama-1.1b-step-50k-105b with openvino backend.

@anzr299 sorry for the delay, is it possible to share the full script to reproduce the issue?