Open Hhhh8 opened 1 week ago
I think flash attention might be a red herring here. The error:
2024-06-29T07:29:25.389048Z WARN text_generation_launcher: Could not import Flash Attention enabled models: cannot import name 'FastLayerNorm' from 'text_generation_server.layers.layernorm' (/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/layernorm.py)
Indicates that FastLayerNorm
cannot be imported, which happens because without a GPU the system type will be detected as CPU and there are only CUDA, ROCm, and IPEX implementations available for FastLayerNorm
.
System Info
OS version: WSL 2. ubuntu 22.04 model: llama3-8B-Instruct Hardware: no GPU
There is no gpu, but I installed the nvcc library in wsl using this command.
sudo apt install nvidia-cuda-toolkit
And no$CUDA_HOME
,$LD_LIBRARY_PATH
Information
Tasks
Reproduction
In WSL shell, I ran a below command
error log
So I went into Docker directly, ran make install, and found the error log.
Expected behavior
Even though I removed the
--gpus
tag and added the--disable-custom-kernels
tag according to the tgi GitHub instructions, the flash error continues to occur. Please tell me how I can run TGI on CPU.