ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.27k stars 895 forks source link

Unable to load Qwen2-VL-1.5B-Instruct model using mlx_lm #1110

Closed dolphingarlic closed 1 day ago

dolphingarlic commented 6 days ago

I tried running the following command to convert Qwen2-VL-1.5B-Instruct to an MLX-compatible model:

python -m mlx_lm.convert --hf-path mit-han-lab/Qwen2-VL-1.5B-Instruct -q

But I get a huge error message about the received parameters not being found in the model:

ValueError: Received parameters not in model: visual.blocks.8.mlp.fc1.bias visual.blocks.30.attn.qkv.weight visual.blocks.2.mlp.fc1.weight visual.blocks.21.attn.qkv.weight visual.blocks.14.norm1.bias visual.blocks.18.norm1.weight visual.blocks.29.attn.proj.bias visual.blocks.15.norm1.weight visual.blocks.31.norm2.weight visual.blocks.8.mlp.fc2.bias visual.blocks.12.mlp.fc1.weight visual.blocks.17.attn.qkv.weight visual.blocks.22.attn.qkv.bias visual.blocks.27.attn.proj.weight visual.blocks.17.mlp.fc2.weight visual.blocks.21.attn.proj.bias visual.blocks.13.attn.proj.weight visual.blocks.2.norm2.weight visual.blocks.5.attn.proj.weight visual.blocks.17.mlp.fc1.bias visual.blocks.8.mlp.fc1.weight visual.blocks.21.mlp.fc1.weight visual.blocks.0.mlp.fc1.bias visual.blocks.25.mlp.fc1.bias visual.blocks.3.norm2.bias visual.blocks.4.attn.proj.weight visual.blocks.9.norm1.weight visual.blocks.9.attn.proj.bias visual.blocks.6.mlp.fc1.weight visual.blocks.3.mlp.fc1.weight visual.blocks.24.mlp.fc2.bias visual.blocks.26.norm1.weight visual.blocks.30.mlp.fc2.weight visual.blocks.24.norm1.weight visual.blocks.21.norm2.bias visual.blocks.14.mlp.fc2.bias visual.blocks.13.attn.qkv.weight visual.blocks.10.mlp.fc2.weight visual.blocks.28.norm1.bias visual.blocks.26.norm2.weight visual.blocks.29.mlp.fc2.bias visual.blocks.10.norm2.weight visual.blocks.3.attn.proj.weight visual.blocks.19.norm2.weight visual.blocks.23.norm2.bias visual.blocks.3.attn.qkv.weight visual.blocks.28.attn.proj.bias visual.blocks.20.norm2.bias visual.blocks.25.attn.qkv.bias visual.blocks.4.attn.proj.bias visual.blocks.21.norm1.bias visual.blocks.27.mlp.fc1.bias visual.blocks.12.norm1.bias visual.blocks.28.attn.qkv.weight visual.blocks.29.attn.qkv.weight visual.blocks.31.norm1.weight visual.blocks.31.mlp.fc1.bias visual.blocks.6.mlp.fc1.bias visual.blocks.4.norm1.bias visual.blocks.19.norm1.bias visual.blocks.25.norm1.bias visual.blocks.1.attn.proj.weight visual.blocks.10.attn.proj.bias visual.blocks.10.norm1.bias visual.blocks.28.attn.qkv.bias visual.blocks.23.attn.proj.weight visual.blocks.2.mlp.fc2.weight visual.blocks.3.norm2.weight visual.blocks.24.attn.proj.bias visual.blocks.5.norm2.bias visual.blocks.18.mlp.fc1.weight visual.blocks.14.mlp.fc2.weight visual.blocks.12.norm2.bias visual.blocks.26.attn.qkv.weight visual.blocks.15.attn.qkv.bias visual.blocks.8.norm1.bias visual.blocks.0.norm1.weight visual.blocks.31.attn.qkv.weight visual.blocks.3.mlp.fc2.bias visual.blocks.22.norm2.weight visual.blocks.12.norm1.weight visual.blocks.10.mlp.fc1.bias visual.blocks.22.mlp.fc2.weight visual.blocks.0.attn.qkv.weight visual.blocks.2.attn.proj.weight visual.blocks.4.mlp.fc1.bias visual.blocks.24.attn.qkv.weight visual.blocks.30.norm1.bias visual.blocks.27.mlp.fc2.bias visual.blocks.7.mlp.fc2.bias visual.blocks.0.attn.qkv.bias visual.blocks.18.norm2.bias visual.blocks.18.mlp.fc2.bias visual.blocks.15.mlp.fc1.bias visual.blocks.7.attn.proj.bias visual.blocks.26.attn.qkv.bias visual.blocks.11.mlp.fc1.bias visual.blocks.25.attn.proj.bias visual.blocks.18.attn.qkv.bias visual.blocks.15.attn.qkv.weight visual.blocks.19.mlp.fc1.bias visual.blocks.30.attn.qkv.bias visual.blocks.18.mlp.fc2.weight visual.blocks.29.norm1.weight visual.blocks.18.norm1.bias visual.blocks.26.mlp.fc2.weight visual.blocks.6.attn.proj.bias visual.blocks.12.mlp.fc2.weight visual.blocks.23.mlp.fc2.weight visual.blocks.6.norm2.weight visual.blocks.2.norm1.bias visual.blocks.7.mlp.fc1.bias visual.blocks.9.attn.qkv.weight visual.blocks.26.norm1.bias visual.blocks.14.mlp.fc1.bias visual.blocks.13.attn.qkv.bias visual.blocks.31.norm1.bias visual.blocks.20.mlp.fc1.weight visual.blocks.1.norm2.weight visual.blocks.17.norm1.bias visual.blocks.17.mlp.fc1.weight visual.blocks.24.norm2.bias visual.blocks.28.mlp.fc2.bias visual.blocks.5.mlp.fc1.weight visual.blocks.6.attn.proj.weight visual.blocks.3.mlp.fc1.bias visual.blocks.1.mlp.fc1.weight visual.blocks.16.norm2.weight visual.blocks.1.attn.proj.bias visual.blocks.12.attn.proj.weight visual.blocks.19.mlp.fc1.weight visual.blocks.1.norm1.bias visual.blocks.3.attn.qkv.bias visual.blocks.26.norm2.bias visual.blocks.13.norm2.bias visual.blocks.29.mlp.fc1.bias visual.blocks.15.norm2.weight visual.blocks.28.attn.proj.weight visual.blocks.19.attn.proj.weight visual.blocks.16.attn.qkv.weight visual.blocks.15.norm1.bias visual.blocks.6.norm2.bias visual.blocks.30.norm2.weight visual.blocks.9.mlp.fc1.bias visual.blocks.22.mlp.fc1.weight visual.blocks.30.norm2.bias visual.blocks.14.norm2.bias visual.blocks.7.mlp.fc1.weight visual.blocks.20.attn.qkv.weight visual.blocks.10.mlp.fc1.weight visual.blocks.30.attn.proj.weight visual.blocks.12.attn.qkv.bias visual.blocks.30.attn.proj.bias visual.blocks.25.norm2.weight visual.blocks.0.norm1.bias visual.blocks.28.mlp.fc2.weight visual.blocks.5.norm2.weight visual.blocks.0.norm2.bias visual.blocks.23.attn.proj.bias visual.merger.mlp.0.weight visual.blocks.10.mlp.fc2.bias visual.blocks.17.attn.proj.weight visual.blocks.2.attn.qkv.weight visual.blocks.0.mlp.fc2.bias visual.blocks.9.norm1.bias visual.blocks.6.mlp.fc2.bias visual.blocks.1.mlp.fc2.weight visual.blocks.11.norm2.weight visual.blocks.19.norm2.bias visual.blocks.29.norm2.weight visual.blocks.23.norm2.weight visual.blocks.5.mlp.fc1.bias visual.blocks.6.norm1.weight visual.blocks.1.mlp.fc2.bias visual.blocks.30.mlp.fc2.bias visual.blocks.1.mlp.fc1.bias visual.blocks.19.attn.proj.bias visual.blocks.21.norm1.weight visual.blocks.26.mlp.fc2.bias visual.blocks.13.mlp.fc1.weight visual.blocks.28.mlp.fc1.weight visual.blocks.1.attn.qkv.weight visual.blocks.10.norm1.weight visual.blocks.4.mlp.fc1.weight visual.blocks.29.attn.proj.weight visual.blocks.16.norm2.bias visual.blocks.21.mlp.fc2.weight visual.blocks.23.mlp.fc1.bias visual.blocks.14.norm2.weight visual.blocks.5.norm1.bias visual.blocks.23.norm1.bias visual.blocks.9.norm2.bias visual.blocks.24.attn.proj.weight visual.blocks.23.mlp.fc2.bias visual.blocks.2.mlp.fc1.bias visual.blocks.16.attn.proj.weight visual.blocks.27.attn.proj.bias visual.blocks.8.norm2.weight visual.blocks.4.mlp.fc2.bias visual.blocks.28.norm2.weight visual.blocks.6.mlp.fc2.weight visual.blocks.21.attn.qkv.bias visual.blocks.28.norm1.weight visual.blocks.15.mlp.fc1.weight visual.blocks.17.norm1.weight visual.blocks.11.mlp.fc2.bias visual.blocks.1.norm2.bias visual.blocks.27.attn.qkv.bias visual.blocks.25.norm2.bias visual.blocks.30.mlp.fc1.bias visual.blocks.10.norm2.bias visual.blocks.16.mlp.fc1.weight visual.blocks.7.attn.qkv.bias visual.blocks.13.norm1.weight visual.blocks.6.norm1.bias visual.blocks.16.attn.proj.bias visual.blocks.27.mlp.fc1.weight visual.blocks.4.norm2.bias visual.blocks.19.attn.qkv.bias visual.blocks.26.attn.proj.weight visual.blocks.25.attn.qkv.weight visual.blocks.11.norm1.bias visual.blocks.31.attn.proj.bias visual.blocks.7.attn.proj.weight visual.blocks.8.attn.proj.weight visual.blocks.5.mlp.fc2.weight visual.blocks.29.mlp.fc2.weight visual.blocks.3.norm1.weight visual.blocks.20.attn.proj.weight visual.blocks.22.norm1.bias visual.blocks.6.attn.qkv.weight visual.blocks.18.mlp.fc1.bias visual.blocks.22.norm1.weight visual.blocks.22.attn.qkv.weight visual.blocks.27.mlp.fc2.weight visual.blocks.11.attn.proj.weight visual.blocks.27.attn.qkv.weight visual.blocks.24.norm2.weight visual.blocks.2.attn.qkv.bias visual.blocks.30.mlp.fc1.weight visual.blocks.3.norm1.bias visual.blocks.28.norm2.bias visual.blocks.21.attn.proj.weight visual.blocks.6.attn.qkv.bias visual.blocks.14.attn.qkv.weight visual.blocks.7.mlp.fc2.weight visual.blocks.5.norm1.weight visual.blocks.0.norm2.weight visual.blocks.13.norm2.weight visual.blocks.10.attn.proj.weight visual.blocks.25.norm1.weight visual.blocks.16.mlp.fc2.bias visual.blocks.31.mlp.fc2.bias visual.blocks.20.norm2.weight visual.blocks.2.mlp.fc2.bias visual.blocks.17.norm2.weight visual.blocks.29.mlp.fc1.weight visual.blocks.5.mlp.fc2.bias visual.blocks.20.attn.qkv.bias visual.blocks.25.mlp.fc1.weight visual.blocks.17.norm2.bias visual.blocks.20.attn.proj.bias visual.blocks.11.mlp.fc1.weight visual.blocks.10.attn.qkv.weight visual.blocks.8.attn.qkv.bias visual.blocks.20.mlp.fc2.weight visual.blocks.24.attn.qkv.bias visual.blocks.26.mlp.fc1.bias visual.blocks.10.attn.qkv.bias visual.blocks.2.norm1.weight visual.blocks.11.norm2.bias visual.blocks.25.mlp.fc2.bias visual.blocks.4.norm2.weight visual.blocks.18.attn.qkv.weight visual.blocks.27.norm1.weight visual.blocks.4.attn.qkv.weight visual.blocks.5.attn.proj.bias visual.merger.ln_q.weight visual.blocks.22.attn.proj.weight visual.blocks.16.mlp.fc2.weight visual.merger.ln_q.bias visual.blocks.31.norm2.bias visual.blocks.27.norm2.weight visual.blocks.24.norm1.bias visual.blocks.15.attn.proj.weight visual.blocks.20.mlp.fc1.bias visual.blocks.14.norm1.weight visual.blocks.7.norm1.weight visual.blocks.20.norm1.bias visual.blocks.21.mlp.fc2.bias visual.blocks.13.mlp.fc1.bias visual.merger.mlp.0.bias visual.blocks.11.mlp.fc2.weight visual.blocks.14.attn.proj.weight visual.blocks.3.mlp.fc2.weight visual.blocks.13.norm1.bias visual.blocks.12.attn.qkv.weight visual.blocks.8.norm2.bias visual.blocks.24.mlp.fc1.bias visual.blocks.22.mlp.fc1.bias visual.blocks.24.mlp.fc1.weight visual.blocks.15.norm2.bias visual.blocks.16.mlp.fc1.bias visual.blocks.12.mlp.fc2.bias visual.blocks.3.attn.proj.bias visual.blocks.9.attn.qkv.bias visual.blocks.9.mlp.fc2.bias visual.blocks.23.attn.qkv.weight visual.blocks.1.attn.qkv.bias visual.blocks.11.attn.qkv.weight visual.blocks.16.norm1.bias visual.blocks.5.attn.qkv.weight visual.blocks.9.attn.proj.weight visual.blocks.11.attn.qkv.bias visual.blocks.30.norm1.weight visual.blocks.2.norm2.bias visual.blocks.8.attn.proj.bias visual.blocks.13.mlp.fc2.bias visual.blocks.17.attn.qkv.bias visual.blocks.29.norm1.bias visual.blocks.27.norm1.bias visual.blocks.15.attn.proj.bias visual.blocks.16.attn.qkv.bias visual.blocks.13.mlp.fc2.weight visual.blocks.31.attn.proj.weight visual.blocks.1.norm1.weight visual.blocks.24.mlp.fc2.weight visual.merger.mlp.2.weight visual.blocks.5.attn.qkv.bias visual.blocks.26.attn.proj.bias visual.blocks.12.mlp.fc1.bias visual.blocks.8.norm1.weight visual.blocks.20.norm1.weight visual.blocks.18.attn.proj.weight visual.blocks.25.mlp.fc2.weight visual.blocks.11.norm1.weight visual.blocks.14.attn.qkv.bias visual.blocks.19.norm1.weight visual.blocks.19.mlp.fc2.bias visual.blocks.29.attn.qkv.bias visual.patch_embed.proj.weight visual.blocks.14.attn.proj.bias visual.blocks.9.mlp.fc1.weight visual.blocks.26.mlp.fc1.weight visual.blocks.2.attn.proj.bias visual.merger.mlp.2.bias visual.blocks.31.attn.qkv.bias visual.blocks.0.mlp.fc1.weight visual.blocks.9.norm2.weight visual.blocks.15.mlp.fc2.weight visual.blocks.22.norm2.bias visual.blocks.8.attn.qkv.weight visual.blocks.12.attn.proj.bias visual.blocks.0.attn.proj.bias visual.blocks.19.mlp.fc2.weight visual.blocks.15.mlp.fc2.bias visual.blocks.23.norm1.weight visual.blocks.28.mlp.fc1.bias visual.blocks.4.attn.qkv.bias visual.blocks.0.mlp.fc2.weight visual.blocks.31.mlp.fc1.weight visual.blocks.7.attn.qkv.weight visual.blocks.12.norm2.weight visual.blocks.21.mlp.fc1.bias visual.blocks.9.mlp.fc2.weight visual.blocks.4.mlp.fc2.weight visual.blocks.7.norm2.bias visual.blocks.25.attn.proj.weight visual.blocks.23.attn.qkv.bias visual.blocks.23.mlp.fc1.weight visual.blocks.27.norm2.bias visual.blocks.14.mlp.fc1.weight visual.blocks.18.attn.proj.bias visual.blocks.17.attn.proj.bias visual.blocks.22.mlp.fc2.bias visual.blocks.4.norm1.weight visual.blocks.29.norm2.bias visual.blocks.16.norm1.weight visual.blocks.17.mlp.fc2.bias visual.blocks.20.mlp.fc2.bias visual.blocks.7.norm1.bias visual.blocks.18.norm2.weight visual.blocks.31.mlp.fc2.weight visual.blocks.19.attn.qkv.weight visual.blocks.11.attn.proj.bias visual.blocks.0.attn.proj.weight visual.blocks.8.mlp.fc2.weight visual.blocks.7.norm2.weight visual.blocks.22.attn.proj.bias visual.blocks.13.attn.proj.bias visual.blocks.21.norm2.weight.

How can I fix this error? I'm able to load this model just fine using the Hugging Face transformers' AutoModel.from_pretrained and AutoTokenizer.from_pretrained functions.

angeloskath commented 6 days ago

Qwen2-VL is a vision language model which is not currently supported in mlx_lm. Have you tried mlx-vlm (https://github.com/Blaizzy/mlx-vlm)?