meta-llama / llama

Inference code for Llama models
Other
56.62k stars 9.59k forks source link

"RuntimeError: CUDA error: unknown error" troubleshooting #516

Open deltaboukensha opened 1 year ago

deltaboukensha commented 1 year ago

Hi. I'm trying to figure out how to troubleshoot this generic error message i get from running the example locally in my machine.

I suspect either the PyTorch or Cuda version is wrong. Or my hardware is insufficient.

How do I determine what the issue is exactly?

Im running the project from docker with GPU and virtualization enabled Docker Images I've tried: docker pull pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel docker pull pytorch/pytorch:1.13.1-cuda11.6-cudnn8-devel

64GB RAM OS Windows 11 NVIDIA GeForce RTX 3070 GPU mem 8 GB / 32 GB

root@e5d28c58c2f4:/llama# torchrun --nproc_per_node 1 example_chat_completion.py \
>     --ckpt_dir llama-2-7b-chat/ \
>     --tokenizer_path tokenizer.model \
>     --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Traceback (most recent call last):
  File "/llama/example_chat_completion.py", line 73, in <module>
    fire.Fire(main)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/llama/example_chat_completion.py", line 20, in main
    generator = Llama.build(
  File "/llama/llama/generation.py", line 96, in build
    model = Transformer(model_args)
  File "/llama/llama/model.py", line 259, in __init__
    self.layers.append(TransformerBlock(layer_id, params))
  File "/llama/llama/model.py", line 221, in __init__     
    self.attention = Attention(args)
  File "/llama/llama/model.py", line 128, in __init__     
    self.cache_k = torch.zeros(
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 15) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.13.1', 'console_scripts', 'torchrun')())
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_chat_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-23_12:47:16
  host      : e5d28c58c2f4
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 15)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
root@e5d28c58c2f4:/llama#
root@e5d28c58c2f4:/llama# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
root@e5d28c58c2f4:/llama# pip show torch
Name: torch
Version: 1.13.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /opt/conda/lib/python3.10/site-packages
Requires: typing_extensions
Required-by: fairscale, llama, torchelastic, torchtext, torchvision
root@e5d28c58c2f4:/llama# 
pzim-devdata commented 1 year ago

I have solved it with a cpu installation by installing this : https://github.com/krychu/llama instead of https://github.com/facebookresearch/llama Complete process to install :

  1. download the original version of Llama from : https://github.com/facebookresearch/llama and extract it to a llama-main folder
  2. download th cpu version from : https://github.com/krychu/llama and extract it and replace files in the llama-main folder
  3. run the download.sh script in a terminal, passing the URL provided when prompted to start the download
  4. go to the llama-main folder
  5. cretate an Python3 env : python3 -m venv env and activate it : source env/bin/activate
  6. install the cpu version of pytorch : python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
  7. install dependencies off llama : python3 -m pip install -e .
  8. run if you have downloaded llama-2-7b :
    torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 1 #(instead of 4)