Closed Blaizzy closed 3 weeks ago
This doesn’t look like the correct fix to me. We need to merge scales if they are there. Could you point to the model that didn’t work so we can debug?
I could be wrong but from what I can see, both Qwen moe models don’t use scales.
Precisely, the Qwen-2-57B-A14B model failed to convert.
Those are scales for the quantization so indeed they won't be in the fp16 models.
I do not have a problem converting Qwen/Qwen2-57B-A14B-Instruct
on main. Can you share the error message you are getting?
But they will be present in quantized models. I see it now.
I will add a condition to skip scales for f16 or f32 models.
I will add a condition to skip scales for f16 or f32 models.
You don't need to add that, it will not try to use scales if they are not present already. See Line 228.
I think maybe the problem is you were trying to convert using an old MLX LM. The current version (on main) already works for Qwen2 MOEs.
I do not have a problem converting
Qwen/Qwen2-57B-A14B-Instruct
on main. Can you share the error message you are getting?
Alright, give me a 15min to redownload the model.
You don't need to add that, it will not try to use scales if they are not present already. See Line 228.
I noticed, and saw the git blame to trace the changes. From what I remember, that's the one causing the error. But let me double check.
That's weird, today the same code is working fine.
Yesterday both pypi and github installations gave an error.
An error exactly in line 228 where it said something about the first expert didn't have scales but I didn't save it anywhere.
When I made the change proposed here it worked.
This PR fixes Qwen2 MoE conversion and inference. Twitter: @Prince_Canuma