Open gehong-coder opened 6 days ago
We've implemented support for eager attention. Could you please test the following code and let me know if you encounter any issues? @gehong-coder
model = AutoModelForCausalLM.from_pretrained(
"rhymes-ai/Aria",
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True, # Corrected 'true' to 'True'
attn_implementation="eager",
)
We've implemented support for eager attention. Could you please test the following code and let me know if you encounter any issues? @gehong-coder
model = AutoModelForCausalLM.from_pretrained( "rhymes-ai/Aria", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, # Corrected 'true' to 'True' attn_implementation="eager", )
Hello, this problem occurs after I use the above settings. It seems that setting attn_implementation = eager here cannot use eager internally.
File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 467, in forward hidden_states, attn_weights = self.self_attn( return super().apply(*args, **kwargs) # type: ignore[misc] File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 619, in forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 88, in _flash_attn_varlen_forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd( RuntimeError: FlashAttention only supports Ampere GPUs or newer.
So I went into modeling_idefics2 and changed line 442 of self.self_attn = IDEFICS_VISION_ATTENTION_CLASSESconfig._attn_implementation and config._attn_implementation to eager. Then it will appear "/home/hong.ge/.cache/huggingface/modules/transformers_modules/5cc2703b3afd585f232ec5027e9c039a2001bcec/modeling_aria.py", line 376, in forward image_outputs, image_attn_mask = self.vision_tower( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/home/hong.ge/.cache/huggingface/modules/transformers_modules/5cc2703b3afd585f232ec5027e9c039a2001bcec/vision_encoder.py", line 120, in forward vit_oup = self.vision_model( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 630, in forward encoder_outputs = self.encoder( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 555, in forward layer_outputs = encoder_layer( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 467, in forward hidden_states, attn_weights = self.self_attn( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 245, in forward raise ValueError( ValueError: Attention mask should be of size (128, 1, 1225, 1225), but is torch.Size([128, 1225])
@gehong-coder Is your local model updated to the latest rhymes-ai/Aria repo? We updated it yesterday
I have updated the model, but it still appears. Is it because grouped_gemm is not installed?
grouped_gemmis not installed, using sequential GEMM, which is slower. AriaMoELMForCausalLM has generative capabilities, as
prepare_inputs_for_generationis explicitly overwritten. However, it doesn't directly inherit from
GenerationMixin. From πv4.50π onwards,
PreTrainedModelwill NOT inherit from
GenerationMixin, and this model will lose the ability to call
generate` and other related functions.
trust_remote_code=True
, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classesGenerationMixin
(after PreTrainedModel
, otherwise you'll get an exception).torch.cuda.amp.autocast(args...)
is deprecated. Please use torch.amp.autocast('cuda', args...)
instead.
with torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):
/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:601: UserWarning: do_sample
is set to False
. However, temperature
is set to 0.0
-- this flag is only used in sample-based generation modes. You should set do_sample=True
or unset temperature
.
warnings.warn(
The seen_tokens
attribute is deprecated and will be removed in v4.41. Use the cache_position
model input instead.
Traceback (most recent call last):
File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/Aria/inference/notebooks/video_in.py", line 166, in Eager attention is not working, and not be able to run the model on V100s. Could you please help with this feature?
@gehong-coder I can't reproduce this error on my local machine. Could you provide some minimal code to reproduce this bug? And what is the version of your transformers
?
V100 cannot use flash attention, so I changed to using eager to calculate attention, self.self_attn = IDEFICS_VISION_ATTENTION_CLASSES"eager"
but the following error occurredοΌ
File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 630, in forward encoder_outputs = self.encoder( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 555, in forward layer_outputs = encoder_layer( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 467, in forward hidden_states, attn_weights = self.self_attn( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/aria/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 245, in forward raise ValueError( ValueError: Attention mask should be of size (128, 1, 1225, 1225), but is torch.Size([128, 1225])