Closed SolsticeProjekt closed 11 months ago
Try deleting your torch extension cache, its location varies by OS, I think defaults are:
%localappdata%\torch_extensions\torch_extensions\Cache\py3[x]_cu[y]\exllama_ext\
(Windows)
~/.cache/torch_extensions/py3[x]_cu[y]/exllama_ext/
(Linux)
which varies based on Python and CUDA versions, where e.g. 3.11 and CUDA 12.1 = py311_cu121
I've had that happen too but it was from ctrl+Cing as I launched ExLlama, messed up something in the cache.
I've experienced this as well, on Linux. It can happen if the process fails while Torch is building the extension, which can randomly leave the extension cache in an invalid state that Torch is unable to recover from. I'm not sure where the cache is stored on Windows, and it probably depends if it's native or WSL, but search for a folder named exllama_ext
containing files like q4_matrix.cuda.o
and build.ninja
. Then delete the whole folder. On the next run ExLlama will take a little while to start as it rebuilds the extension.
You can also change verbose = False
to verbose = True
at the top of cuda_ext.py
, which will give you a bunch of output from that build process. If startup still hangs this might help pin down where it's getting stuck.
Thanks to both of you. Deleting the respective cache folder did the trick. That also means that CUDA, in this case 12.1, caused the tripplefault.
On the next run ExLlama will take a little while to start as it rebuilds the extension.
Indeed. After deleting the folder, it took 24 seconds to run. Next run was around 8 seconds.
Thanks! (also you're all awesome for being smarter than me, I wish I could catch up. Holy shit, it's so much!)
So, I had a BSOD trying to load open_llama_3b_v2-8k-GPTQ, but I'm not convinced that has anything to do with it. There was no error code in the BSOD that I recall, but the reason given was MEMORY_MANAGEMENT.
Now, when I start example_basic, regardless of the model I try to run, it does nothing. When I interrupt using CTRL+C, I get this:
Traceback (most recent call last): File "###\exllama\example_basic.py", line 1, in <module> from model import ExLlama, ExLlamaCache, ExLlamaConfig File "###\exllama\model.py", line 12, in <module> import cuda_ext File "###\exllama\cuda_ext.py", line 43, in <module> exllama_ext = load( ^^^^^ File "###\Lib\site-packages\torch\utils\cpp_extension.py", line 1301, in load return _jit_compile( ^^^^^^^^^^^^^ File "###\Lib\site-packages\torch\utils\cpp_extension.py", line 1538, in _jit_compile baton.wait() File "###\Lib\site-packages\torch\utils\file_baton.py", line 42, in wait time.sleep(self.wait_seconds) KeyboardInterrupt
In task manager, it's stuck at 0% CPU and uses 250megs of RAM.
That, sadly, is all I have. Google wasn't helpfull at all and my assumptions are most likely useless.
Any help is appreciated.
Edit: It appears to be stuck at the imports already. Specifically, it gets stuck at this line: "from model import ExLlama".