microsoft / Phi-3CookBook

This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open sourced AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
MIT License
2.51k stars 260 forks source link

Phi-3 cannot be used in VLLM inference #217

Closed yananchen1989 closed 2 weeks ago

yananchen1989 commented 3 weeks ago

hi,

vllm 0.6.3.post1

here is the testing script:

llm = LLM(model= llm_name, dtype='float16',
            tensor_parallel_size=4, gpu_memory_utilization= 0.96, #seed=None,
            trust_remote_code=True,
            quantization="bitsandbytes", load_format="bitsandbytes", enforce_eager=True, 
            enable_lora=False,
            cpu_offload_gb = 48
            #tokenizer_mode= "mistral" if args.llm_name.startswith('mistralai') else 'auto'
        ) 

when llm_name is microsoft/Phi-3.5-mini-instruct, or microsoft/Phi-3-mini-128k-instruct or other models under the same series, inference causes error [rank0]: KeyError: 'layers.21.mlp.gate_up_proj.weight'

leestott commented 2 weeks ago

Hi @yananchen1989

The KeyError you're encountering suggests that there's a mismatch between the model layers expected by the script and the actual model layers loaded from the checkpoint. This can happen due to several reasons, such as differences in model architecture or issues with the quantization process.

Here are a few steps you can take to troubleshoot and resolve this issue:

Verify Model Architecture: Ensure that the model architecture defined in your script matches the architecture of the Phi-3.5-mini-instruct model. Check the model's documentation or source code for any discrepancies.

Check Quantization Settings: Since you're using bitsandbytes quantization, make sure that the quantization settings are correctly applied and compatible with the model3. You might need to adjust the quantization parameters or try a different quantization method.

Update Dependencies: Ensure that all your dependencies, including PyTorch, transformers, and any other libraries, are up to date. Sometimes, compatibility issues can arise from outdated packages.

Load Model Weights Manually: If the automatic loading process is causing issues, you can try loading the model weights manually. This involves specifying the exact layers and weights to load from the checkpoint.

Consult GitHub Issues: Check if others have encountered similar issues and if there are any solutions or workarounds suggested in the GitHub repository or related forums. https://github.com/vllm-project/vllm/issues

yananchen1989 commented 2 weeks ago

thanks. i guess it is the quantization="bitsandbytes", load_format="bitsandbytes" causes the error.