microsoft / LLaVA-Med

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Other
1.59k stars 202 forks source link

Removing Dependency on "flash-attn" in a Deep Learning Project #42

Open LiangXin1001 opened 10 months ago

LiangXin1001 commented 10 months ago

Hello everyone,

I am currently working on a deep learning project using a server that only supports CUDA 11.0. The project relies on flash-attn, which requires CUDA >=11.6, and unfortunately, I do not have the permission to upgrade CUDA on the server.

I am seeking advice on how to modify the project to remove the dependency on flash-attn. Here are some specific questions I have:

Alternative Libraries: Are there any alternative libraries compatible with CUDA 11.0 that can replace flash-attn without significantly impacting the project's performance?

Code Modifications: If I remove flash-attn, what are the key areas in the code that would need modification? I am particularly concerned about how this might affect the model's training and inference performance.

Impact Assessment: What potential impacts should I anticipate in terms of model accuracy, training time, and resource utilization after removing flash-attn?

Thank you in advance for any guidance or suggestions you can provide.

haotian-liu commented 10 months ago

Theoretically the CUDA runtime can have a larger minor version than the CUDA driver version. I am not sure if it works for flash-attention, but you can install CUDA runtime library in a local folder, and then try to reinstall flash-attention and pytorch.

sh cuda_11.x_linux.run

export PATH=/your/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/your/local/cuda/lib64:$LD_LIBRARY_PATH

If 11.6 does not work, also try 11.4 and flash-attn==2.0.4 as well.

Please also update here so that other users can benefit if this works, thanks.

LiangXin1001 commented 10 months ago

Theoretically the CUDA runtime can have a larger minor version than the CUDA driver version. I am not sure if it works for flash-attention, but you can install CUDA runtime library in a local folder, and then try to reinstall flash-attention and pytorch.

sh cuda_11.x_linux.run

export PATH=/your/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/your/local/cuda/lib64:$LD_LIBRARY_PATH

If 11.6 does not work, also try 11.4 and flash-attn==2.0.4 as well.

Please also update here so that other users can benefit if this works, thanks.

Thank you! I've since switched to a cuda version 11.8 server