vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.07k stars 4.54k forks source link

[Bug]: un-deterministic image processor file downloading error for phi-3 vision #7848

Open gyin94 opened 2 months ago

gyin94 commented 2 months ago

Your current environment

vllm == 0.5.5.

🐛 Describe the bug

when we deploy the microsoft/Phi-3.5-vision-instruct,

it will randomly hit this issue.

[1;36m(VllmWorkerProcess pid=2195) ERROR 08-25 08:03:14 multiproc_worker_utils.py:226] FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/hub/models--microsoft--Phi-3.5-vision-instruct/snapshots/c68f85286eac3fb376a17068e820e738a89c194a/processing_phi3_v.py'

the problem might be caused by this https://github.com/vllm-project/vllm/blob/80162c44b1d1e59a2c10f65b6adb9b0407439b1f/vllm/multimodal/image.py#L16 for multi gpus environment that head one hasn't yet finished downloading. is it better to put it where AutoTokenizer is run?

Before submitting a new issue...

gyin94 commented 2 months ago

cc @youkaichao

youkaichao commented 2 months ago

cc @ywang96 @DarkLight1337

can we download these code when we download the model?

in https://docs.vllm.ai/en/latest/getting_started/debugging.html , I recommend users to use huggingface-cli to download models first. if possible, we can also recommend users to download these scripts too.

downloading and loading code at runtime, is quite complicated, and can break distributed inference with multi-gpu or multi-node easily.

DarkLight1337 commented 2 months ago

Would it be ok if we download this at the model loading stage? Currently, it's done at the profiling stage after the model runners are initialized, which may be causing the problem.

youkaichao commented 2 months ago

@gyin94 does it occur in just multi-gpu inference, or in multi-node inference?

gyin94 commented 2 months ago

it happened for single node multi-gpu as well. I used what you suggested to preload the processor before running the vllm serve. the error is gone. though it would be better to solve it like tokenizers or models loading in vllm serve

youkaichao commented 2 months ago

preload the processor before running the vllm serve

can you give more details on how to do this? we can add it to the doc.

gyin94 commented 2 months ago

my solution needs a separate python script to run before running the vllm serve.

model_dir = "microsoft/Phi-3.5-vision-instruct"
# config is from AutoConfig.from_pretrained
if config.model_type == "phi3_v":
     # A temporary fix for phi-3-vision. Load and cache processor.
     from transformers import AutoProcessor

     AutoProcessor.from_pretrained(model_dir, trust_remote_code=True, num_crops=4)