Closed l0d0v1c closed 4 months ago
May I ask you if you fine-tune by writing your own code or directly use mlx_lm for fine-tuning? I’ve been recently writing a model file for phi3, perhaps I could help you solve this issue.
Thanks AlexC I used python mlx_lm.lora -m ... The generate module works fine but not lora
May I ask about your specific config file or config settings when fine-tuning with lora? I tried using Phi-3-mini-4k-instruct for lora fine-tuning today, but the loss is normal.
Hi this should be fixed in the latest MLX https://github.com/ml-explore/mlx/pull/1028. There was an issue with quantizing all 0s which produced NaNs and which was exposed in Phi3.
Note to get it to work we will need to re-quantize Phi3 from the original weights using the latest MLX (so either build from source or wait till we release).
May I ask about your specific config file or config settings when fine-tuning with lora? I tried using Phi-3-mini-4k-instruct for lora fine-tuning today, but the loss is
May I ask about your specific config file or config settings when fine-tuning with lora? I tried using Phi-3-mini-4k-instruct for lora fine-tuning today, but the loss is normal.
Could you share your congi and loss metrics, LORA > LORA py Errors out
@kishoretvk last update solved the issue
When I try to finetune phi-3 (Phi-3-mini-128k-instruct-8bit) I get the same issue I previously had for mixtral with a loss nan
Trainable parameters: 0.042% (1.573M/3750.282M) Loading datasets Training Starting training..., iters: 100 Iter 1: Val loss nan, Val took 61.907s Iter 10: Train loss nan, Learning Rate 1.000e-05, It/sec 0.264, Tokens/sec 530.838, Trained Tokens 20105, Peak mem 16.146 GB