meta-llama / llama

Inference code for Llama models
Other
55.36k stars 9.44k forks source link

Failed to run example_chat_completion.py because AssertionError on assert bsz <= params.max_batch_size #1043

Open snowymo opened 6 months ago

snowymo commented 6 months ago

Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and existing/past issues

Describe the bug

Fix NCCL issue via https://github.com/facebookresearch/llama/issues/699, added a bunch of code at the beginning of generation.py

Minimal reproducible example

torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir "llama-2-7b-chat/" --tokenizer_path "tokenizer.model" --max_seq_len 128 --max_batch_size 4

Output

bsz 6 params.max_batch_size 4

\llama\generation.py", line 172, in generate
    assert bsz <= params.max_batch_size, (bsz, params.max_batch_size)
AssertionError: (6, 4)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25976) of binary:

Runtime Environment

Additional context Add any other context about the problem or environment here.

Dongjae0324 commented 2 months ago

I am facing the same error with you. Have you solved the issue?!