Closed longruiqi closed 8 months ago
Hi @longruiqi
As the notebook states, setting use_flash_attention=True
is mandatory. Our flash attention implementation has small numerical differences compared to the attention implementation in Huggingface.
We are working on releasing a new version of GoLLIE that uses the Huggingface LLaMA implementation, so using our custom LLaMA implementation is not a requirement anymore. But at the time of developing the current version of GoLLIE, flash attention 2 was not implemented yet in huggingface so we were forced to use a custom implementation.
Thank you, I understand. I set use_flash_attention=False
because I found that after flash attention 2.3.3 was correctly installed(can be seen in conda list
), there was still an error message "ImportError: Please install RoPE kernels". I have checked conda list that rotary-emb 0.1 was installed.
Hi @longruiqi
Running this command should fix your issue with rotary embeddings:
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
Describe the task
create custom task.ipynb
fileDescribe the bug I set
use_flash_attention=False
inThen everything went well until
RUN GoLLIE
and there was an error message:
System Info