Closed applecool closed 5 days ago
So a few things here:
--ignore-chat-template
- If you are fine-tuning a text dataset then you should not use the chat template with it. (Alternatively use a completions dataset for fine-tuning). To disable the chat template during generation you can pass
--ignore-chat-template
Yes, I am fine tuning a text dataset (wikisql example). And when you say you shouldn't use the chat template with it - this is when prompting the fine tuned model. Is that correct ?
Regarding the completion dataset (which is what I have (Q/A pairs) for my toy example) for fine-tuning - Each family of model requires the train dataset to be passed in a particular format.
For example: Mistral wants the Q/A pair to look something like follows:
{"text": "<s>[INST] You are an expert at writing SQL statements. \n You will be given a question and you will come with a SQL expression. table: 1-1000181-1\ncolumns: State/territory, Text/background colour, Format, Current slogan, Current series, Notes\nQ: Tell me what the notes are for South Australia \n[/INST]A: SELECT Notes FROM 1-1000181-1 WHERE Current slogan = 'SOUTH AUSTRALIA'"}</s>
Does this format doesn't matter when fine tuning using MLX ?
- Does the fine-tuned model work in MLX? Is the behavior different if you use a fused model vs keeping the adapters separate? Sometimes there is precision loss from fusing the adapters.
Yes it does work. But the fused model's generations have gone awry. If I keep the adapters separate, I get good results. My model wasn't quantized. I think the default is fp16. Is there a way to tune a model at fp32 ?
- If you are seeing issues with terminating the output maybe try upgrading your MLX LM. We recently started appending the EOS token during fine-tuning so that should fix / help with that.
I am on the latest mlx-lm version = 0.14.3 :) Is there a simple jupyter notebook that shows the interactions with the mlx-lm, fine-tuning ?
And when you say you shouldn't use the chat template with it - this is when prompting the fine tuned model. Is that correct ?
Exactly.
And when you say you shouldn't use the chat template with it - this is when prompting the fine tuned model. Is that correct ?
The format can matter. You can either give raw text or use a completion or chat format which will add in the model's chat template. You can read more about the supported dataset formats in the docs.
I think the default is fp16. Is there a way to tune a model at fp32 ?
Convert it to fp32 using mlx_lm.convert
then fine-tune it. Pass --dtype float32
to the conversion script.
Is there a simple jupyter notebook that shows the interactions with the mlx-lm, fine-tuning ?
Something like this?
Awesome, thanks a lot @awni, appreciate you taking time answering these questions for me :) I will try the fp32 and also inspect the magnitude of the weights and adapters (like you mentioned on the other issue) as I don't really wanna go up to fp32 (but will give it a shot to see how it works), would be expensive to run inference on it moving forward.
And yes the notebook you shared is what I was looking for :) Cheers
I'll close this for now and will reopen if I run into issues after attempting the suggested techniques :)
Hey MLX team,
I took the simple wikisql example from the mlx-examples repo and fine tuned the
Mistral-7B-Instruct-v0.2
model.The motivation behind trying to reproduce this with wikisql example is: I fine tuned the same Mistral model with my own data (around 9000 records), and after fusing, converting it into GGUF, I run into a similar problem. Ideally, I would like to run the fine tune on Mistral using MLX, fuse it, convert it into GGUF, and then serve it so that I can run a bunch of questions against the fine tuned model and see how it will perform.
Steps followed for the wikisql:
batch-size=1, lora-layers=4, iters=1000
)python convert-hf-to-gguf.py ~/lora_fused_model
ollama create mlx-sql-ft-mistral -f Modelfile
Generation using the fine tuned model:
Result looks as follows:
Running the gguf model on ollama does the following:
I had to Ctrl-D to stop the generation.
Is this a known issue ? I did try to prompt it differently, still running into the same issue. Maybe there is a very specific way to prompt it? Any idea or suggestions?
Is there a way to run the gguf model using mlx ? or directly the fused model using mlx?
I am hoping to get some advice. I would really appreciate it.
cc @awni