Closed yuanjiayiy closed 2 months ago
I was trying to run a minimal example for openvla. Since flash attention installation would take a long time, I commented it out.
vla = AutoModelForVision2Seq.from_pretrained( "openvla/openvla-7b", #attn_implementation="flash_attention_2", # [Optional] Requires `flash_attn` torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True ) vla = vla.to("cuda:0") image = np.zeros((256, 256, 3), dtype=np.uint8) INSTRUCTION = "open the cabinet" prompt = f"In: What action should the robot take to {INSTRUCTION}?\nOut:" # Predict Action (7-DoF; un-normalize for BridgeData V2) inputs = processor(prompt, Image.fromarray(image).convert("RGB")).to("cuda:0", dtype=torch.bfloat16) action = vla.predict_action(**inputs, unnorm_key="bridge_orig", do_sample=False)
I got the following error.
Traceback (most recent call last): File "evaluate_vla_policy_real.py", line 198, in <module> run(**params) File "evaluate_vla_policy_real.py", line 85, in run action = vla_predict_action(camera) File "evaluate_vla_policy_real.py", line 53, in vla_predict_action action = vla.predict_action(**inputs, unnorm_key="bridge_orig", do_sample=False) File "/home/carrie/.cache/huggingface/modules/transformers_modules/openvla/openvla-7b/e5822cc24559b04e532f49b1c1ddb64376c1a485/modeling_prismatic.py", line 517, in predict_action generated_ids = self.generate(input_ids, max_new_tokens=self.get_action_dim(unnorm_key), **kwargs) File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/transformers/generation/utils.py", line 1576, in generate result = self._greedy_search( File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search outputs = self( File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/carrie/.cache/huggingface/modules/transformers_modules/openvla/openvla-7b/e5822cc24559b04e532f49b1c1ddb64376c1a485/modeling_prismatic.py", line 404, in forward language_model_output = self.language_model( File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1210, in forward outputs = self.model( File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1018, in forward layer_outputs = decoder_layer( File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 741, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/carrie/miniconda3/envs/openvla/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 375, in forward attn_weights = attn_weights + causal_mask RuntimeError: The size of tensor a (274) must match the size of tensor b (273) at non-singleton dimension 3
The version information as follow: Python 3.8.19 Torch 2.1.0 CUDA 11.8 RTX 3090 24 GB VRAM
Hi , I encountered the same issue before, but I resolved it by updating the PyTorch version to 2.2.0. I hope this method works for you as well.
Hi @yuanjiayiy,
I tried running your code (with additional lines and import statements to make it actually run without errors) and could not reproduce the issue. Here is the simple script that I ran:
import numpy as np
import torch
from PIL import Image
from transformers import AutoModelForVision2Seq, AutoProcessor
vla = AutoModelForVision2Seq.from_pretrained(
"openvla/openvla-7b",
# attn_implementation="flash_attention_2", # [Optional] Requires `flash_attn`
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
)
vla = vla.to("cuda:0")
processor = AutoProcessor.from_pretrained("openvla/openvla-7b", trust_remote_code=True)
image = np.zeros((256, 256, 3), dtype=np.uint8)
INSTRUCTION = "open the cabinet"
prompt = f"In: What action should the robot take to {INSTRUCTION}?\nOut:"
# Predict Action (7-DoF; un-normalize for BridgeData V2)
inputs = processor(prompt, Image.fromarray(image).convert("RGB")).to("cuda:0", dtype=torch.bfloat16)
action = vla.predict_action(**inputs, unnorm_key="bridge_orig", do_sample=False)
print(f"action: {action}")
Here are the versions that I tested this with: Python 3.10.14 Torch 2.2.0 CUDA 12.0 NVIDIA RTX A5000 24 GB VRAM
As @hongzhitao mentioned, you may want to try upgrading your package versions.
I'll close this issue for now, but feel free to reopen it if you continue to run into errors!
-Moo Jin
I was trying to run a minimal example for openvla. Since flash attention installation would take a long time, I commented it out.
I got the following error.
The version information as follow: Python 3.8.19 Torch 2.1.0 CUDA 11.8 RTX 3090 24 GB VRAM