Phi-3 cannot be used in VLLM inference

yananchen1989 commented 3 weeks ago

hi,

vllm 0.6.3.post1

here is the testing script:

llm = LLM(model= llm_name, dtype='float16',
            tensor_parallel_size=4, gpu_memory_utilization= 0.96, #seed=None,
            trust_remote_code=True,
            quantization="bitsandbytes", load_format="bitsandbytes", enforce_eager=True, 
            enable_lora=False,
            cpu_offload_gb = 48
            #tokenizer_mode= "mistral" if args.llm_name.startswith('mistralai') else 'auto'
        )

when llm_name is microsoft/Phi-3.5-mini-instruct, or microsoft/Phi-3-mini-128k-instruct or other models under the same series, inference causes error [rank0]: KeyError: 'layers.21.mlp.gate_up_proj.weight'

leestott commented 2 weeks ago

Hi @yananchen1989

The KeyError you're encountering suggests that there's a mismatch between the model layers expected by the script and the actual model layers loaded from the checkpoint. This can happen due to several reasons, such as differences in model architecture or issues with the quantization process.

Here are a few steps you can take to troubleshoot and resolve this issue:

Verify Model Architecture: Ensure that the model architecture defined in your script matches the architecture of the Phi-3.5-mini-instruct model. Check the model's documentation or source code for any discrepancies.

Check Quantization Settings: Since you're using bitsandbytes quantization, make sure that the quantization settings are correctly applied and compatible with the model3. You might need to adjust the quantization parameters or try a different quantization method.

Update Dependencies: Ensure that all your dependencies, including PyTorch, transformers, and any other libraries, are up to date. Sometimes, compatibility issues can arise from outdated packages.

Load Model Weights Manually: If the automatic loading process is causing issues, you can try loading the model weights manually. This involves specifying the exact layers and weights to load from the checkpoint.

Consult GitHub Issues: Check if others have encountered similar issues and if there are any solutions or workarounds suggested in the GitHub repository or related forums. https://github.com/vllm-project/vllm/issues

yananchen1989 commented 2 weeks ago

thanks. i guess it is the quantization="bitsandbytes", load_format="bitsandbytes" causes the error.

microsoft / Phi-3CookBook

Phi-3 cannot be used in VLLM inference #217