Open JerryLu991223 opened 1 month ago
Which HF repo did you originally get the model from?
Which HF repo did you originally get the model from?
microsoft/Phi-3-vision-128k-instruct, https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/tree/main. Also, I meet with similar situation on llava-v1.6, so I suppose it is not the model that leads to the error.
Can you try it on a smaller model like LLaVA-1.5 and see if you get the same problem?
Can you try it on a smaller model like LLaVA-1.5 and see if you get the same problem?
But I find that in LLava-v1.5 collection, the smallest model is 7B and this one is 4B. Do you mean https://huggingface.co/llava-hf/llava-interleave-qwen-0.5b-hf?
Sorry, I mean lighter weight, not necessarily in terms of number of parameters. I suggested LLaVA-1.5 because it doesn't crop the image into patches before inputting them into the model, so it should take less resources to run it.
Sorry, I mean lighter weight, not necessarily in terms of number of parameters. I suggested LLaVA-1.5 because it doesn't crop the image into patches before inputting them into the model, so it should take less resources to run it.
It seems I meet with the same problem. Also, for phi-3(4B), it cost 450 sec to load model while it cost only 10sec to load model on a single node. Thus, I think there should be some errors that greatly slow down the loading and inference.
I have updated the distributed broadcasting logic in #6836. See if it helps.
Regarding slow model loading, please follow this guide.
Your current environment
🐛 Describe the bug
When I run the script, I get normal log which indicates that I have loaded the model and begin to inference with GPU 100% utilization.
But after that, I have waited a long period of time and finally get the error below
I can normally do the inference with language model, but when it comes to vision language model, then I meet with this error. May I get any help and suggestions?