Failed to run example_chat_completion.py because AssertionError on assert bsz <= params.max_batch_size - Githubissues

meta-llama / llama

Inference code for Llama models

Other

55.36k stars 9.44k forks source link

Failed to run example_chat_completion.py because AssertionError on assert bsz <= params.max_batch_size #1043

Open snowymo opened 6 months ago

snowymo commented 6 months ago

Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and existing/past issues

Describe the bug

Fix NCCL issue via https://github.com/facebookresearch/llama/issues/699, added a bunch of code at the beginning of generation.py

Minimal reproducible example

torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir "llama-2-7b-chat/" --tokenizer_path "tokenizer.model" --max_seq_len 128 --max_batch_size 4

Output

bsz 6 params.max_batch_size 4

\llama\generation.py", line 172, in generate
    assert bsz <= params.max_batch_size, (bsz, params.max_batch_size)
AssertionError: (6, 4)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25976) of binary:

Runtime Environment

Model: [eg: llama-2-7b-chat]
Using via huggingface?: [no]
OS: [eg. Windows]
GPU VRAM: 16GB
Number of GPUs: 1
GPU Make: [eg: Nvidia, AMD, Intel]

Additional context Add any other context about the problem or environment here.

Dongjae0324 commented 2 months ago

I am facing the same error with you. Have you solved the issue?!