Huggingface mlx-community/Phi-3-mini-128k-instruct-4bit model cannot be loaded

neilmehta24 commented 1 month ago

I am interested in running the mlx-community/Phi-3-mini-128k-instruct-4bit model with swift, but it cannot be loaded. Here is the output I am seeing:

➜  mlx-swift-examples git:(main) ./mlx-run llm-tool eval --model mlx-community/Phi-3-mini-128k-instruct-4bit -m 100000 -p "<|system|>
You are a helpful assistant.<|end|>
<|user|>
How to explain Internet for a medieval knight?<|end|>
<|assistant|> "
Error: typeMismatch(Swift.String, Swift.DecodingError.Context(codingPath: [CodingKeys(stringValue: "rope_scaling", intValue: nil), _JSONKey(stringValue: "long_factor", intValue: nil)], debugDescription: "Expected to decode String but found an array instead.", underlyingError: nil)

Note that the mlx-community/Phi-3-mini-4k-instruct-4bit model can be loaded. The 128k model includes a non-null rope_scaling field in the config.json, and the 4k model doesn't have a rope_scaling factor. The model 128k load is failing because it can't process the rope scaling

awni commented 1 month ago

Interestingly we have support for that model in Python but it looks like it is just ignores the rope scaling.

Somehow it still gives pretty good results for short generations 🤔 but presumably it would not work too well beyond 4k

davidkoski commented 1 month ago

It looks like the ropeScaling is a dictionary of heterogenous types. We already have:

    var ropeScaling: [String: StringOrNumber]? = nil

I could change this to StringOrNumberOrArray (something like that anyway!)

The code uses the ropeScaling["type"] to figure out what to do and if it isn't a type it recognizes it will just use a ropeScaling of 1.

Specifically it needs to handle:

        "long_factor": [
            1.0700000524520874,
            1.1200000047683716,

I am not sure what is intended with the array -- I guess they are supposed to correspond to the hidden layer counts? There are also two sets of values here.

Anyway, we could at least enable it to load if we changed this.

awni commented 1 month ago

I am not sure what is intended with the array -- I guess they are supposed to correspond to the hidden layer counts? There are also two sets of values here.

Those are values per dimension. So you scale each dimension with a unique value. Python implementation is here.

There are two of them because one gets used for short sequences and the other gets used for positions after a certain step in longer sequences.

ml-explore / mlx-swift-examples

Huggingface mlx-community/Phi-3-mini-128k-instruct-4bit model cannot be loaded #92