python test_mmlu.py
-- Loading dataset: cais/mmlu/anatomy...
/home/tyra/files/ai/tabby/env/lib/python3.11/site-packages/datasets/load.py:1429: FutureWarning: The repository for cais/mmlu contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/cais/mmlu
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
-- Loading dataset: cais/mmlu/computer_security...
-- Loading dataset: cais/mmlu/formal_logic...
-- Loading dataset: cais/mmlu/logical_fallacies...
-- Loading dataset: cais/mmlu/philosophy...
-- Loading dataset: cais/mmlu/nutrition...
-- Loading model: /home/tyra/storage/gpu-models/yi-34bx2-moe-60b/2.8bpw
Traceback (most recent call last):
File "/home/tyra/files/ai/exllamav2/tests/test_mmlu.py", line 141, in <module>
model, cache, tokenizer = get_model(model_base, variant, gpu_split, 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tyra/files/ai/exllamav2/tests/test_mmlu.py", line 61, in get_model
model_.load(gpu_split_)
File "/home/tyra/files/ai/tabby/env/lib/python3.11/site-packages/exllamav2/model.py", line 244, in load
for item in f: return item
File "/home/tyra/files/ai/tabby/env/lib/python3.11/site-packages/exllamav2/model.py", line 263, in load_gen
module.load()
File "/home/tyra/files/ai/tabby/env/lib/python3.11/site-packages/exllamav2/moe_mlp.py", line 56, in load
self.post_attention_layernorm.load()
File "/home/tyra/files/ai/tabby/env/lib/python3.11/site-packages/exllamav2/rmsnorm.py", line 23, in load
w = self.load_weight()
^^^^^^^^^^^^^^^^^^
File "/home/tyra/files/ai/tabby/env/lib/python3.11/site-packages/exllamav2/module.py", line 99, in load_weight
tensor = self.load_multi(["weight"])["weight"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tyra/files/ai/tabby/env/lib/python3.11/site-packages/exllamav2/module.py", line 75, in load_multi
tensors[k] = st.get_tensor(self.key + "." + k).to(self.device())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Hello,
Here is the error I am facing:
Here are the config variables in the script:
This setup work perfectly fine with other non MoE models. Also, MoE models works with the test_inference.py script for perplexity evaluation.