rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE
Apache License 2.0
656 stars 50 forks source link

V100 run video understanding #29

Open gehong-coder opened 6 days ago

gehong-coder commented 6 days ago

V100 cannot use flash attention, so I changed to using eager to calculate attention, self.self_attn = IDEFICS_VISION_ATTENTION_CLASSES"eager"

but the following error occurred:

File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 630, in forward encoder_outputs = self.encoder( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 555, in forward layer_outputs = encoder_layer( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 467, in forward hidden_states, attn_weights = self.self_attn( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 245, in forward raise ValueError( ValueError: Attention mask should be of size (128, 1, 1225, 1225), but is torch.Size([128, 1225])

aria-hacker commented 6 days ago

We've implemented support for eager attention. Could you please test the following code and let me know if you encounter any issues? @gehong-coder

model = AutoModelForCausalLM.from_pretrained(
    "rhymes-ai/Aria",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,  # Corrected 'true' to 'True'
    attn_implementation="eager",
)
gehong-coder commented 5 days ago

We've implemented support for eager attention. Could you please test the following code and let me know if you encounter any issues? @gehong-coder

model = AutoModelForCausalLM.from_pretrained(
    "rhymes-ai/Aria",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,  # Corrected 'true' to 'True'
    attn_implementation="eager",
)

Hello, this problem occurs after I use the above settings. It seems that setting attn_implementation = eager here cannot use eager internally.

File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 467, in forward hidden_states, attn_weights = self.self_attn( return super().apply(*args, **kwargs) # type: ignore[misc] File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 619, in forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 88, in _flash_attn_varlen_forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd( RuntimeError: FlashAttention only supports Ampere GPUs or newer.

So I went into modeling_idefics2 and changed line 442 of self.self_attn = IDEFICS_VISION_ATTENTION_CLASSESconfig._attn_implementation and config._attn_implementation to eager. Then it will appear "/home/hong.ge/.cache/huggingface/modules/transformers_modules/5cc2703b3afd585f232ec5027e9c039a2001bcec/modeling_aria.py", line 376, in forward image_outputs, image_attn_mask = self.vision_tower( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/home/hong.ge/.cache/huggingface/modules/transformers_modules/5cc2703b3afd585f232ec5027e9c039a2001bcec/vision_encoder.py", line 120, in forward vit_oup = self.vision_model( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 630, in forward encoder_outputs = self.encoder( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 555, in forward layer_outputs = encoder_layer( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 467, in forward hidden_states, attn_weights = self.self_attn( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 245, in forward raise ValueError( ValueError: Attention mask should be of size (128, 1, 1225, 1225), but is torch.Size([128, 1225])

aria-hacker commented 5 days ago

@gehong-coder Is your local model updated to the latest rhymes-ai/Aria repo? We updated it yesterday

gehong-coder commented 5 days ago

I have updated the model, but it still appears. Is it because grouped_gemm is not installed?

grouped_gemmis not installed, using sequential GEMM, which is slower. AriaMoELMForCausalLM has generative capabilities, asprepare_inputs_for_generationis explicitly overwritten. However, it doesn't directly inherit fromGenerationMixin. From πŸ‘‰v4.50πŸ‘ˆ onwards,PreTrainedModelwill NOT inherit fromGenerationMixin, and this model will lose the ability to callgenerate` and other related functions.

saeedkhaki92 commented 4 days ago

Eager attention is not working, and not be able to run the model on V100s. Could you please help with this feature?

aria-hacker commented 3 days ago

@gehong-coder I can't reproduce this error on my local machine. Could you provide some minimal code to reproduce this bug? And what is the version of your transformers?