meta-llama / llama

Inference code for Llama models
Other
56.43k stars 9.57k forks source link

Error running `example_chat_completion.py` on `llama-2-7b-chat` #430

Open krsnnik opened 1 year ago

krsnnik commented 1 year ago

python 3.8 PyPi running on a nvidia rtx 3900

torchrun --nproc_per_node 1 example_chat_completion.py     --ckpt_dir llama-2-7b-chat/     --tokenizer_path tokenizer.model     --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 9.42 seconds
Traceback (most recent call last):
  File "example_chat_completion.py", line 73, in <module>
    fire.Fire(main)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "example_chat_completion.py", line 56, in main
    results = generator.chat_completion(
  File "/home/kliu/Workspace/llama/llama/generation.py", line 270, in chat_completion
    generation_tokens, generation_logprobs = self.generate(
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/kliu/Workspace/llama/llama/generation.py", line 146, in generate
    next_token = sample_top_p(probs, top_p)
  File "/home/kliu/Workspace/llama/llama/generation.py", line 301, in sample_top_p
    next_token = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 155743) of binary: /home/kliu/Workspace/llama/env/bin/python3
Traceback (most recent call last):
  File "/home/kliu/Workspace/llama/env/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
example_chat_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-19_14:51:37
  host      : eleusis
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 155743)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
zhpinkman commented 1 year ago

I have the same issue. I tried reducing the batch_size, but it's not helping.

jonsoku-dev commented 1 year ago

I have the same issue.

$ pip install -e .

$ torchrun --nproc_per_node 1 example_chat_completion.py \                              at  08:09:59
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

$ torchrun --nproc_per_node 1 example_text_completion.py \                        ✘ INT at  08:12:29
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 4
zhpinkman commented 1 year ago

I could fix my issue using lower max_seq_len. hope this helps.

ghost commented 1 year ago

zhpinkman

Thank you! what was your set max_seq_len ?

it is also occured error..

torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 10 --max_batch_size 4
zhpinkman commented 1 year ago

I was using 512, which was throwing the error; with 256, it's working fine. Also, note that you can limit the number of prompts you have in the input. In the default template, there are four prompts if I'm correct. You can reduce that to only one example if you have a smaller GPU. The whole point of the error is batches that cannot be fitted on GPU, so playing around with mentioned parameters can help prevent the issue.

ghost commented 1 year ago

Thank you. but It doesn't work for me :( There seems to be a lot of related issues, so I'm watching this issue..!

gucaslyz commented 1 year ago

same error, and reduce max_seq_len to 128 not work.

pzim-devdata commented 1 year ago

I have solved it with a cpu installation by installing this : https://github.com/krychu/llama instead of https://github.com/facebookresearch/llama Complete process to install :

  1. download the original version of Llama from : https://github.com/facebookresearch/llama and extract it to a llama-main folder
  2. download th cpu version from : https://github.com/krychu/llama and extract it and replace files in the llama-main folder
  3. run the download.sh script in a terminal, passing the URL provided when prompted to start the download
  4. go to the llama-main folder
  5. cretate an Python3 env : python3 -m venv env and activate it : source env/bin/activate
  6. install the cpu version of pytorch : python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
  7. install dependencies off llama : python3 -m pip install -e .
  8. run if you have downloaded llama-2-7b :
    torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 1 #(instead of 4)
krsnnik commented 1 year ago

Tried 128 as well and did not work, Also tried to reduce max_batch_size down to 1, also did not work, same RuntimeError: probability tensor contains either inf, nan or element < 0 error

nisargjoshi10 commented 1 year ago

Running into the same error. Tried changing batch size and max_seq_len but neither worked

sthreepi commented 1 year ago

Increasing the max_batch_size to >4 works. I set it to 6 and it works. torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ --max_seq_len 128 --max_batch_size 1

maowenyu-11 commented 1 year ago

I've solved this error by setting the “max_batch_size” to a multiple of the number of prompts

XanderDevelops commented 12 months ago

Same error here, nothing seems to work

prathams177 commented 5 months ago

i trying to run Llama3 model 8B got this issue -

(llama3chatbot) C:\Users\prath\llama3-main>torchrun --nproc_per_node 1 example_chat_completion.py \ --ckpt_dir Meta-Llama-3-8B/ \ --tokenizer_path tokenizer .model \ --max_seq_len 128 --max_batch_size 1 failed to create process.

it showing failed to process . whats the issue ? help!!