AutoModelForCausalLM.from_pretrained(.., gpu_layers=..) gives Windows Error 0xc000001d

  Any use of "gpu_layers" crashes it.

CUDA is working:

(ct) C:\Users\Jeremy\Documents>python Python 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

import torch torch.cuda.is_available() True



I just made a new python environment:

python -m venv python_envs\llamacpp
python_envs\llamacpp\Scripts\activate

git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
python -m pip install --upgrade pip
cd llama-cpp-python
pip install -e .[all]

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install ctransformers[cuda]

~~~THEN~~~

(llamacpp) C:\Users\Jeremy\Documents>python process_dataset.py
Fetching 1 files: 100%|██████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 999.12it/s]
Traceback (most recent call last):
  File "C:\Users\Jeremy\Documents\process_dataset.py", line 46, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_path, gpu_layers=10, model_file="xwin-lm-13b-v0.2.Q6_K.gguf")#, model_type='gguf', max_new_tokens=3500, repetition_penalty=1.07, temperature=0.1, top_k=15, top_p=0.97, last_n_tokens=40, seed=142857, stream=False, reset=False, batch_size=512, threads=10, context_length=4096) #, gpu_layers=12
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jeremy\python_envs\llamacpp\Lib\site-packages\ctransformers\hub.py", line 175, in from_pretrained
    llm = LLM(
          ^^^^
  File "C:\Users\Jeremy\python_envs\llamacpp\Lib\site-packages\ctransformers\llm.py", line 247, in __init__
    self._llm = self._lib.ctransformers_llm_create(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError -1073741795] Windows Error 0xc000001d

~~~THEN~~~

I edited the offending line to remove "gpu_layers=10,", and it worked with the cpu-only setup:

(llamacpp) C:\Users\Jeremy\Documents>python process_dataset.py
Fetching 1 files: 100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1010.68it/s]
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 999.83it/s]
p>  Tag each word of the following SENTENCE using these semantic tags: agent, main-action, purpose, direct-object, theme, relative pronon, status-action, location, recipient, comparison marker, referential-agent, transitive-action, means marker, time-indicator, description, status, conjunction, theme. SENTENCE: And after these things I saw four angels standing on the four corners of the earth, holding the four winds of the earth, that the wind should not blow on the earth, nor on the sea, nor on any tree.

...

marella / ctransformers

AutoModelForCausalLM.from_pretrained(.., gpu_layers=..) gives Windows Error 0xc000001d #167