Closed Paramstr closed 5 months ago
I think its just due to the fact that we quantize the embeddings now and they are quite large for Gemma: https://huggingface.co/mlx-community/quantized-gemma-2b/blob/main/model.safetensors.index.json#L6
2048 256000 2 = 1GB 2048 256000 0.5 = 250MB
Have you tried running your converted model in Python with mlx lm? Does it work there? If so this is probably a Swift issue, not a conversion issue.
Works now, was using an older version of the repo and I think that was messing with it.
Hi, so I've been trying to convert the Gemma model to mlx and can't understand why the model sizes decrease more than expected (which I believe is the source of the error below when running in Xcode).
Conversion Code
Produces: (Paramstr/MLX_google_gemma-2b-it_testing if wanted to see the files) model.safetensors - 1.41 GB "quantization": { "group_size": 64, "bits": 4 },
Whereas a community uploaded Gemma MLX model - mlx-community/quantized-gemma-2b-it model.safetensors - 2.16 GB "quantization": { "group_size": 64, "bits": 4 }
Any clue why this is the case?