mark-lord / MLX-text-completion-notebook

A simple Jupyter Notebook for learning MLX text-completion fine-tuning!
Apache License 2.0
73 stars 8 forks source link

Unrecognized Argument Error for --adapter-file #2

Open RajaRuling opened 2 months ago

RajaRuling commented 2 months ago

Description

When running the mlx-usft.ipynb notebook on M1 Mac with the --adapter-file argument, it results in an "unrecognized arguments" error. It seems like the argument is either not implemented or incorrectly handled.

Steps to Reproduce

  1. Run all cells of the notebook on a M1 Mac
  2. Observe the error in 10th code cell.

Expected Behavior

The script should recognize the --adapter-file argument and use the specified adapter file for training or testing as intended.

Actual Behavior

The script throws an error: lora.py: error: unrecognized arguments: --adapter-file trial1.npz.

usage: lora.py [-h] [--model MODEL] [--train] [--data DATA]
               [--lora-layers LORA_LAYERS] [--batch-size BATCH_SIZE]
               [--iters ITERS] [--val-batches VAL_BATCHES]
               [--learning-rate LEARNING_RATE]
               [--steps-per-report STEPS_PER_REPORT]
               [--steps-per-eval STEPS_PER_EVAL]
               [--resume-adapter-file RESUME_ADAPTER_FILE]
               [--adapter-path ADAPTER_PATH] [--save-every SAVE_EVERY]
               [--test] [--test-batches TEST_BATCHES]
               [--max-seq-length MAX_SEQ_LENGTH] [-c CONFIG]
               [--grad-checkpoint] [--seed SEED]

Possible Solution

Additional Information

Please let me know if there's a different way to specify the adapter file or if there's an update needed to handle this argument correctly. Thanks!

couillonnade commented 1 month ago

The documentation here states that the argument should be "--adapter-path" instead of "--adapter-file" as well as the usage example you printed above.

Now the problem is further with the dequantization

Loading pretrained model Trainable parameters: 0.108% (1.126M/1044.752M) Loading datasets Training Starting training..., iters: 500 Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx_lm/lora.py", line 271, in main() File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx_lm/lora.py", line 267, in main run(types.SimpleNamespace(args)) File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx_lm/lora.py", line 221, in run train( File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx_lm/tuner/trainer.py", line 214, in train lvalue, toks = step(batch) ^^^^^^^^^^^ File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx_lm/tuner/trainer.py", line 190, in step (lvalue, toks), grad = loss_value_and_grad(model, batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx/nn/utils.py", line 34, in wrapped_value_grad_fn value, grad = value_grad_fn(model.trainable_parameters(), args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx/nn/utils.py", line 28, in inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx_lm/tuner/trainer.py", line 67, in default_loss logits = model(inputs) ^^^^^^^^^^^^^ File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx_lm/models/llama.py", line 188, in call out = self.model(inputs, cache) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx_lm/models/llama.py", line 157, in call h = self.embed_tokens(inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/envs/mlx/lib/python3.12/site-packages/mlx/nn/layers/quantized.py", line 98, in call out = mx.dequantize( ^^^^^^^^^^^^^^ ValueError: [dequantize] The matrix should be given as a uint32

Model training complete.

If you're getting a strange error and the training isn't happening (will be obvious as it'll end instantly).

CPU times: user 17 ms, sys: 10.1 ms, total: 27.1 ms Wall time: 2.1 s

Not sure if related to this issue: https://github.com/ml-explore/mlx/issues/814