turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.52k stars 271 forks source link

"Loading exllamav2_ext extension (JIT)... Building C++/CUDA extension" hangs forever #495

Closed AgeOfAlgorithms closed 3 months ago

AgeOfAlgorithms commented 3 months ago

Hello, I've been having trouble with this problem quite a few times recently.

2024-06-07_16-45

The program will hang at 14% (or some other number) forever. I ran it yesterday overnight and it didn't move an inch in 6 hours. I've had this problem in about 4 different occasions using both CUDA and ROCm. On every occasion, my only solution was to format the OS (This is too time consuming so I'm desparately looking for another solution). Once this build procedure finishes, everything will run fine, and I've been using ExllamaV2 for a long time.

Here are the versions of my current setup:

Ubuntu 22.04.4 LTS gcc-12 nvidia-driver-555 CUDA 12.5 torch 2.2.0

Can anyone help me? I can provide more information if needed - I'm not very knowledgeable please let me know what commands to run

turboderp commented 3 months ago

The feedback is a little unstable because ExLlama has to try to deduce the overall progress from what's going on in the build directory while Torch and ninja are doing their thing.

You can see what's actually going on if you modify exllamav2/ext.py, near the top where it says verbose = False. Change that to True and you'll get a wall of text output to the console which hopefully can tell you where it's stalling, which would be helpful. (:

In any case I would recommend upgrading Torch if possible. 2.3.x does improve a few things, though probably nothing related to the problem you're having.

turboderp commented 3 months ago

(Another approach of course it to use one of the prebuilt wheels if you're having too many issues building from source.)

AgeOfAlgorithms commented 3 months ago

Thanks, @turboderp , I think using a prebuilt wheel solved the problem. I'll close the issue