Struggling to convert models to MLX

Paramstr commented 5 months ago

Hi, so I've been trying to convert the Gemma model to mlx and can't understand why the model sizes decrease more than expected (which I believe is the source of the error below when running in Xcode).

libc++abi: terminating due to uncaught exception of type std::invalid_argument: [matmul] Last dimension of first input with shape (1,862,2048) must match second to last dimension of second input with shape (256,256000).

Conversion Code

python3 -m mlx_lm.convert \
    --hf-path google/gemma-2b-it \
    -q

Produces: (Paramstr/MLX_google_gemma-2b-it_testing if wanted to see the files) model.safetensors - 1.41 GB "quantization": { "group_size": 64, "bits": 4 },

Whereas a community uploaded Gemma MLX model - mlx-community/quantized-gemma-2b-it model.safetensors - 2.16 GB "quantization": { "group_size": 64, "bits": 4 }

Any clue why this is the case?

awni commented 5 months ago

I think its just due to the fact that we quantize the embeddings now and they are quite large for Gemma: https://huggingface.co/mlx-community/quantized-gemma-2b/blob/main/model.safetensors.index.json#L6

2048 256000 2 = 1GB 2048 256000 0.5 = 250MB

Have you tried running your converted model in Python with mlx lm? Does it work there? If so this is probably a Swift issue, not a conversion issue.

Paramstr commented 5 months ago

Works now, was using an older version of the repo and I think that was messing with it.

ml-explore / mlx-examples

Struggling to convert models to MLX #834