ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.29k stars 896 forks source link

Contribute Hugging Face models to the MLX Community #155

Open awni opened 11 months ago

awni commented 11 months ago

We encourage you to join the MLX Community on Hugging Face πŸ€— and upload new MLX converted models and versions of existing models.

Vaibhavs10 commented 11 months ago

Love it! Here's some more information about uploading to the πŸ€— hub.

Once you've converted the checkpoints via the convert.py files in the respective model directory, you can upload large files over to the hub as follows:

First, make sure to install huggingface_hub via pip install --upgrade huggingface_hub.

Second, run huggingface-cli login and paste a write token over from https://huggingface.co/settings/tokens.

Next, run the below code:


from huggingface_hub import HfApi
from huggingface_hub import logging

logging.set_verbosity_info()

api = HfApi()

api.upload_folder(folder_path="<LOCAL FOLDER PATH>", 
                  repo_id="mlx-community/<MODEL_NAME>", 
                  repo_type="model", 
                  multi_commits=True, 
                  multi_commits_verbose=True)

Read more here about file uploads here: https://huggingface.co/docs/huggingface_hub/guides/upload

maxtheman commented 11 months ago

Is there any more information about what's needed to author a convert.py for a given model? I'm seeing a lot of similarities between them in terms of loading the weights and then grouping into a .npz file but I'm not clear what's driving all of the differences.

Bert needs lots of keys replaced: https://github.com/ml-explore/mlx-examples/blob/main/bert/convert.py Phi needs some: https://github.com/ml-explore/mlx-examples/blob/main/llms/phi2/convert.py Mistral doesn't need any, very straightforward: https://github.com/ml-explore/mlx-examples/blob/main/llms/mistral/convert.py and Mixtral has a bunch of stuff going on, but I think mostly for handling the MoE architecture.

In my case β€” I was just looking at converting Mamba to MLX. I think I have a good reference implementation for running the model but I'm having trouble understanding how I would convert the weights appropriately to test it. Looking at the repo on HF it's not obvious to me what I should be keying in on to start making decisions. https://huggingface.co/state-spaces/mamba-370m/tree/main

Any suggestions are appreciated.

awni commented 11 months ago

It's a bit more of an art at this point, we've been pushing for some standardization / consistency which is why things look the same. But there are still some subtle differences.

The short answer to your question is:

The output of convert.py should be two things:

  1. weights.npz (or multiple weights files for large models)
  2. config.json (to store the model metadata). Add a model_type field to this (for now it's unused, but will be important to work with Hugging Face in the future).

Beyond that it's really up to you how to structure the weights.npz and the config.json. You just need to be able to load them and construct the model in the file which actually runs the model. For the examples in mlx-examples, the weights.npz has keys which match the nn.Module hierarchy of the corresponding model. So you can basically do:

model.load_weights("weights.npz")

So to make that work cleanly, in the conversion code remap the keys in the PyTorch model state_dicts so they match the keys expected by the MLX model.

mzbac commented 11 months ago

@awni Thanks for this great initiative. I'm just curious why we can't provide the conversion script for people to convert the HF format model to mlx format? For example, the current conversion script for tiny_llama is pretty much compatible with the llama mode in HF format. What's the reason behind naming it as tinay_llama instead of something like hf_llama? https://github.com/ml-explore/mlx-examples/blob/main/llms/llama/convert.py#L54

awni commented 11 months ago

@mzbac that's a good question. Since different models have slight differences, we are doing a convention now where each model gets a converter. The underlying converters can use HF format (as in Tiny Llama) or original distributed format (as in Llama).

I know it's a bit convoluted at the moment. As we work with the community and you all we'll work towards simplifying and standardizing. Any suggestions and/or pinpoints on that front are appreciated.

PS the short term future I would like to move towards is to avoid a pre conversion step entirely:

  1. Read the repo from HF in the safetensors format directly
  2. Do any necessary remapping of keys in the weights and config on a model specific basis to make it all work

And someday MLX in Transformers directly, so no model.py file necessary either :)

mzbac commented 11 months ago

It would be better if we provided a script for downloading the models. That way, people wouldn't have to install git-lfs and pull the model via git. Here is an example script I used in the past to pull the public model from Hugging Face.

import argparse
import requests
import os
from tqdm import tqdm

def download_file(url, path):
    response = requests.get(url, stream=True)
    total_size_in_bytes = int(response.headers.get('content-length', 0))
    block_size = 1024 #1 Kbyte
    progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)

    with open(path, 'wb') as file:
        for data in response.iter_content(block_size):
            progress_bar.update(len(data))
            file.write(data)

    progress_bar.close()

def download_model(model_name, destination_folder="models"):
    # Define the base URL and headers for the Hugging Face API
    base_url = f"https://huggingface.co/{model_name}/resolve/main"
    headers = {"User-Agent": "Hugging Face Python"}

    # Send a GET request to the Hugging Face API to get a list of all files
    response = requests.get(f"https://huggingface.co/api/models/{model_name}", headers=headers)
    response.raise_for_status()

    # Extract the list of files from the response JSON
    files_to_download = [file["rfilename"] for file in response.json()["siblings"]]

    # Ensure the directory exists
    os.makedirs(f"{destination_folder}/{model_name}", exist_ok=True)

    # Download each file
    for file in files_to_download:
        print(f"Downloading {file}...")
        download_file(f"{base_url}/{file}", f"{destination_folder}/{model_name}/{file}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("model_name", type=str, help="Name of the model to download.")
    args = parser.parse_args()

    download_model(args.model_name)

Just run

   python download.py <huggingface_model_name> .eg. mlx-community/Llama-2-7b-chat-mlx
rupurt commented 11 months ago

@maxtheman I would also love to try an run mamba models with MLX. Did you end up finding any good resources?

Vaibhavs10 commented 10 months ago

Hi all, I hope you are doing well. With recently merged support for hf_llms, you should be able to use models directly from the Hugging Face Hub (Mistral and llama architectures are currently supported). πŸ”₯

In addition to this we've shipped an updated conversion script - which allows you to provide a HF Hub model name and it automagically quantises and uploads the weights to mlx-community org.

You would need to do the following:

  1. git clone https://github.com/ml-explore/mlx-examples

  2. cd mlx-examples/llms/hf_llms

  3. pip install -r requirements.txt

  4. python convert. py --hf-path "codellama/CodeLlama-13b-Python-hf" -q --upload-name CodeLlama-13b-Python-hf-4bit-mlx

That's it!

Help us increase the coverage of MLX quantised checkpoints by quantising llama & mistral checkpoints from the hub.

Note: Currently, there is a minor bug in the conversion script, which errors out on upload. You can check out this PR (https://github.com/ml-explore/mlx-examples/pull/221) for conversion.

Edit: The PR is now merged, so no need to worry about the Note above.

P.S. Feel free to tag @awni, @pcuenca or me (@Vaibhavs10) if you face any issues!

sukkritsharmaofficial commented 10 months ago

Hey @Vaibhavs10 , i'm getting this error while quantizing hf models to 4bit mlx :

self.weight, self.scales, self.biases = mx.quantize(weight, group_size, bits)
ValueError: [quantize] All dimensions should be divisible by 32 for now

conversion script :

python convert.py --hf-path "teknium/OpenHermes-2.5-Mistral-7B" -q --upload-name OpenHermes-2.5-Mistral-7B-hf-4bit-mlx

how do i fix this?

awni commented 10 months ago

@sukkritsharmaofficial check the issue here for the reason.

Looking at that model I think it's possible the issue is the vocab size is not divisible by 32 (I see in the config "vocab_size": 32002). That's a tricky one. It may need to wait for non-32 multiples.

CC @angeloskath

tingbaozhao commented 10 months ago

Hi all, I hope you are doing well. With recently merged support for hf_llms, you should be able to use models directly from the Hugging Face Hub (Mistral and llama architectures are currently supported). πŸ”₯

In addition to this we've shipped an updated conversion script - which allows you to provide a HF Hub model name and it automagically quantises and uploads the weights to mlx-community org.

You would need to do the following:

  1. git clone https://github.com/ml-explore/mlx-examples
  2. cd mlx-examples/llms/hf_llms
  3. pip install -r requirements.txt
  4. python convert. py --hf-path "codellama/CodeLlama-13b-Python-hf" -q --upload-name CodeLlama-13b-Python-hf-4bit-mlx

That's it!

Help us increase the coverage of MLX quantised checkpoints by quantising llama & mistral checkpoints from the hub.

Note: Currently, there is a minor bug in the conversion script, which errors out on upload. You can check out this PR (#221) for conversion.

Edit: The PR is now merged, so no need to worry about the Note above.

P.S. Feel free to tag @awni, @pcuenca or me (@Vaibhavs10) if you face any issues!

the link is not exists ? click hf_llms, i got a 404 error !

mzbac commented 10 months ago

Hi all, I hope you are doing well. With recently merged support for hf_llms, you should be able to use models directly from the Hugging Face Hub (Mistral and llama architectures are currently supported). πŸ”₯ In addition to this we've shipped an updated conversion script - which allows you to provide a HF Hub model name and it automagically quantises and uploads the weights to mlx-community org. You would need to do the following:

  1. git clone https://github.com/ml-explore/mlx-examples
  2. cd mlx-examples/llms/hf_llms
  3. pip install -r requirements.txt
  4. python convert. py --hf-path "codellama/CodeLlama-13b-Python-hf" -q --upload-name CodeLlama-13b-Python-hf-4bit-mlx

That's it! Help us increase the coverage of MLX quantised checkpoints by quantising llama & mistral checkpoints from the hub. Note: Currently, there is a minor bug in the conversion script, which errors out on upload. You can check out this PR (#221) for conversion. Edit: The PR is now merged, so no need to worry about the Note above. P.S. Feel free to tag @awni, @pcuenca or me (@Vaibhavs10) if you face any issues!

the link is not exists ? click hf_llms, i got a 404 error !

It has been moved to https://github.com/ml-explore/mlx-examples/tree/main/llms :)

Anirud-Mohan commented 9 months ago

hey @awni , I'm trying to fuse my finetuned model into a GGUF file. But while I execute the fuse command, I'm getting list index out of range error. The fuse command I used is as follows : python fuse.py --model mistralai/Mistral-7B-Instruct-v0.2 --save-path ./lora-fused-model --adapter-file adapters.npz --de-quantize

The error thrown is as follows : `File "/Users/tis-ai-server/workspace/mlx-examples/.venv/lib/python3.11/site-packages/mlx/utils.py", line 123, in tree_unflatten int(tree[0][0].split(".", maxsplit=1)[0])


IndexError: list index out of range`
awni commented 9 months ago

That is strange. Can you share the full stack trace so we can see more precisely the issue arises?

Anirud-Mohan commented 9 months ago

@awni Here's the full stack trace, hope this helps :

`None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Loading pretrained model Fetching 11 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 109071.74it/s] Traceback (most recent call last): File "/Users/tis-ai-server/workspace/mlx-examples/lora/fuse.py", line 98, in model.update_modules(tree_unflatten(de_quantize_layers)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/tis-ai-server/workspace/mlx-examples/.venv/lib/python3.11/site-packages/mlx/utils.py", line 123, in tree_unflatten int(tree[0][0].split(".", maxsplit=1)[0])


IndexError: list index out of range`
sukhvir1313 commented 8 months ago

@awni Here's the full stack trace, hope this helps :

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Loading pretrained model Fetching 11 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 109071.74it/s] Traceback (most recent call last): File "/Users/tis-ai-server/workspace/mlx-examples/lora/fuse.py", line 98, in <module> model.update_modules(tree_unflatten(de_quantize_layers)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/tis-ai-server/workspace/mlx-examples/.venv/lib/python3.11/site-packages/mlx/utils.py", line 123, in tree_unflatten int(tree[0][0].split(".", maxsplit=1)[0]) ~~~~^^^ IndexError: list index out of range

Any updates on this error, I am also receiving this kind of error.

awni commented 8 months ago

Can you share steps to reproduce?

sukhvir1313 commented 8 months ago

Can you share steps to reproduce?

Below is the command and result that I got.

`conda activate AppleMLXExample

(AppleMLXExample) usermlx@MacBook-Pro lora % python fuse.py --model mistralai/Mistral-7B-Instruct-v0.2 --adapter-file adapters.npz --save-path Models/My-Mistral-7B-fine-tuned --upload-name My-Mistral-7B-fine-tuned --de-quantize Loading pretrained model Fetching 11 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 111983.84it/s] Traceback (most recent call last): File "/Users/usermlx/Documents/Me/programming/ai/mlx-examples/lora/fuse.py", line 98, in model.update_modules(tree_unflatten(de_quantize_layers)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/usermlx/miniconda3/envs/AppleMLXExample/lib/python3.11/site-packages/mlx/utils.py", line 122, in tree_unflatten int(tree[0][0].split(".", maxsplit=1)[0])


IndexError: list index out of range`
mzbac commented 8 months ago

Can you share steps to reproduce?

Below is the command and result that I got.

`conda activate AppleMLXExample

(AppleMLXExample) usermlx@MacBook-Pro lora % python fuse.py --model mistralai/Mistral-7B-Instruct-v0.2 --adapter-file adapters.npz --save-path Models/My-Mistral-7B-fine-tuned --upload-name My-Mistral-7B-fine-tuned --de-quantize Loading pretrained model Fetching 11 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 111983.84it/s] Traceback (most recent call last): File "/Users/usermlx/Documents/Me/programming/ai/mlx-examples/lora/fuse.py", line 98, in model.update_modules(tree_unflatten(de_quantize_layers)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/usermlx/miniconda3/envs/AppleMLXExample/lib/python3.11/site-packages/mlx/utils.py", line 122, in tree_unflatten int(tree[0][0].split(".", maxsplit=1)[0]) ~~~~^^^ IndexError: list index out of range`

It seems like this was caused by trying to de-quantize a non-quantized model. Would you try running the fuse without the --de-quantize flag?

sukhvir1313 commented 8 months ago

Yes, it works that way.Sent from my iPhoneOn 24 Mar 2024, at 9:04β€―AM, Anchen @.***> wrote:ο»Ώ

Can you share steps to reproduce?

Below is the command and result that I got. conda activate AppleMLXExample (AppleMLXExample) ***@***.*** lora % python fuse.py --model mistralai/Mistral-7B-Instruct-v0.2 --adapter-file adapters.npz --save-path Models/My-Mistral-7B-fine-tuned --upload-name My-Mistral-7B-fine-tuned --de-quantize Loading pretrained model Fetching 11 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 111983.84it/s] Traceback (most recent call last): File "/Users/usermlx/Documents/Me/programming/ai/mlx-examples/lora/fuse.py", line 98, in model.update_modules(tree_unflatten(de_quantize_layers)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/usermlx/miniconda3/envs/AppleMLXExample/lib/python3.11/site-packages/mlx/utils.py", line 122, in tree_unflatten int(tree[0][0].split(".", maxsplit=1)[0]) ~~~~^^^ IndexError: list index out of range

It seems like this was caused by trying to de-quantize a non-quantized model. Would you try running the fuse without the --de-quantize flag?

β€”Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>