lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.63k stars 4.52k forks source link

Multiple gpus error vicuna-7b-v1.3: RuntimeError: probability tensor contains either inf, nan or element < 0 #2577

Open erickFBG opened 12 months ago

erickFBG commented 12 months ago

This code below is return an error, running in multi GPUs.

python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.3 --num-gpus 3 --max-gpu-memory 6GiB

USER: what can you do? ASSISTANT: пеamaire favor Arab Dra benep) Rev1 Storage78ITE마zigate point fail4check therefore ':-erserserszeroispecies Theirulleselfoproreu'> definitionSsm Traceback (most recent call last): File "/FastChat/fastchat/serve/inference.py", line 174, in generate_stream indices = torch.multinomial(probs, num_samples=2) RuntimeError: probability tensor contains eitherinf,nanor element < 0

surak commented 11 months ago

Can you check if the files are somehow damaged? What gpus are these?

erickFBG commented 11 months ago

The files its ok, i have 3 nvidia a2000 running with 6Gib. i have already test another model (fastchat-t5-3b-v1.0) in single gpu and is working but when i try use 3 the same output is returning.

surak commented 11 months ago

I see. And can you use three instances of the smaller model, one on each gpu, just to make sure you can run them?

erickFBG commented 11 months ago

i tested here a small model and is working, i load fastchat-t5-3b-v1.0 with 8bit on each gpu.

erickFBG commented 11 months ago

I made some news tests and i see a thing Interesting, when a run with just two gpu the model doesn't broken but if i trial run with three still broken. image