I am unable to load any model into VRAM. If it runs on CPU only, then it works without issue.
I can't even load a 1B parameter without getting an error.
I am unable to install Flash-Attention, but I turned it off in the model menu, however, it still tries to load flash attention for some reason.
I've reinstalled Text Gen Web UI about a dozen times with different versions of ROCM (5.6/5.7/6.12) and Text Gen Web UI (1.12/1.13/1.14/1.15)
Is there an existing issue for this?
[X] I have searched the existing issues
Reproduction
I attempt to load the model as normal and it crashes with the same memory fault 100% of the time.
Screenshot
No response
Logs
19:43:11-546068 INFO Loading "Llama-3.2-1B-Instruct-exl2" 19:43:12-304425 WARNING Failed to load flash-attention due to the following error: Traceback (most recent call last):
File "/home/rsa/text-generation-webui/modules/exllamav2_hf.py", line 23, in <module>
import flash_attn
ModuleNotFoundError: No module named 'flash_attn'
/home/rsa/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:611: UserWarning: `do_sample` is set to `False`. However, `min_p` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `min_p`.
warnings.warn(
Memory access fault by GPU node-1 (Agent handle: 0xc4f2d80) on address (nil). Reason: Page not present or supervisor privilege.
System Info
Ubuntu 22.04
3x Radeon Instinct MI100
AMD Epyc 9334
ROCM 6.1.2
Text Gen Web UI V1.15
Describe the bug
I am unable to load any model into VRAM. If it runs on CPU only, then it works without issue. I can't even load a 1B parameter without getting an error. I am unable to install Flash-Attention, but I turned it off in the model menu, however, it still tries to load flash attention for some reason. I've reinstalled Text Gen Web UI about a dozen times with different versions of ROCM (5.6/5.7/6.12) and Text Gen Web UI (1.12/1.13/1.14/1.15)
Is there an existing issue for this?
Reproduction
I attempt to load the model as normal and it crashes with the same memory fault 100% of the time.
Screenshot
No response
Logs
System Info