AttributeError: 'LlamaModel' object has no attribute '_use_flash_attention_2'

raghavgarg97 commented 6 months ago

I was running speedup.sh with Llama model but got this issue trace.

The error follows from the file Consistency_LLM/cllm/cllm_llama_modeling.py https://github.com/hao-ai-lab/Consistency_LLM/blob/b2a7283bafd65121e868b92fbeb811aac140be17/cllm/cllm_llama_modeling.py#L154

the code needs to be updated to if self.model.config._attn_implementation=='flash_attention_2': Do i need to change model config to check speed of base model with jacobi iteration? base model="meta-llama/Meta-Llama-3-8B-Instruct"

snyhlxde1 commented 6 months ago

Did you use package versions we provided in requirements.txt? If not, what are the pytorch and transformers versions you are using?

poedator commented 4 months ago

With all respect, the versions in requirements.txt are quite dated (transformers 4.36). could you please make an effort and find a solution that works with the current transformers version.

Also, please reduce the requirements.txt size from 180 packages to some manageable minimum, and avoid locking to specific version unless critical.

hao-ai-lab / Consistency_LLM

AttributeError: 'LlamaModel' object has no attribute '_use_flash_attention_2' #10