nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
69.13k stars 7.59k forks source link

Issue: Fine-tuning Falcon model is giving "KeyError: 'weight_decay' #1357

Closed svaidyans closed 4 months ago

svaidyans commented 1 year ago

Issue you'd like to raise.

Hi,

I am trying to fine-tune the Falcon model. Here are my parameters:

model_name: "nomic-ai/gpt4all-falcon" # add model here
tokenizer_name: "nomic-ai/gpt4all-falcon" # add model here
gradient_checkpointing: true
save_name: "/Users/vs/Library/Application Support/nomic.ai/GPT4All/ggml-model-gpt4all-falcon-q4_0_trained.bin" # CHANGE 

# dataset
streaming: false
num_proc: 64
dataset_path: "/Users/vs/Sites/hrBOT/hrboard.jsonl" # update
max_length: 1024
batch_size: 32

# train dynamics
lr: 5.0e-5
eval_every: 800
eval_steps: 100
save_every: 800
output_dir: "/Users/vs/Library/Application Support/nomic.ai/GPT4All/" # CHANGE
checkpoint: null
lora: false
warmup_steps: 100
num_epochs: 2

# logging
wandb: false
wandb_entity: # update
wandb_project_name: # update
seed: 42

When running the suggested command for fine-tuning:

accelerate launch --num_processes=8 --num_machines=1 --machine_rank=0 train.py --config configs/train/finetune.yaml

I am getting the KeyError on 'weight_decay' and below is the traceback:

Traceback (most recent call last):
  File "/Users/vs/Sites/hrBOT/gpt4all/gpt4all-training/train.py", line 240, in <module>
    train(accelerator, config=config)
  File "/Users/vs/Sites/hrBOT/gpt4all/gpt4all-training/train.py", line 80, in train
    optimizer = optimizer_cls(model.parameters(), lr=config["lr"], weight_decay=config["weight_decay"])
KeyError: 'weight_decay'
Traceback (most recent call last):
  File "/Users/vs/Library/Python/3.9/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/Users/vs/Library/Python/3.9/lib/python/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/Users/vs/Library/Python/3.9/lib/python/site-packages/accelerate/commands/launch.py", line 941, in launch_command
    simple_launcher(args)
  File "/Users/vs/Library/Python/3.9/lib/python/site-packages/accelerate/commands/launch.py", line 603, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Library/Developer/CommandLineTools/usr/bin/python3', 'train.py', '--config', 'configs/train/finetune.yaml']' returned non-zero exit status 1.

Suggestion:

Can you please advise how to get this fixed. Thanks in advance.

cheesgno commented 1 year ago

is there news regarding this problem? I have the same problem

imtiyaz-shaikh commented 11 months ago

@svaidyans , @cheesgno : I am facing slightly different problem that is related to 'model_name'. Could you please tell me if you able to configure the model name properly. I have specific model_name: "openlm-research/open_llama_7b" tokenizer_name: "openlm-research/open_llama_7b" in the finetune.yaml file. Do I need to download open_llama_7b binary file and place it under /gpt4all-training directory? What about tokenizer as well?

cebtenzzre commented 4 months ago

Closing this issue as stale. A lot has changed since Nomic last trained a text completion model.