predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
https://loraexchange.ai
Apache License 2.0
1.86k stars 125 forks source link

AssertionError when using model "google/gemma-2b" with multi-gpus #500

Open tritct opened 3 weeks ago

tritct commented 3 weeks ago

System Info

Screenshot 2024-06-06 114916

Information

Tasks

Reproduction

I'm trying to run Docker on 2 A16 GPUS using model_id "google/gemma-2b". But after the model downloading step I run into AssertionError like the following.

image

image

Expected behavior

When I run with only 1 GPU it can initialize just fine. This issue only happen when I try to use multigpu.