huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.32k stars 1.17k forks source link

Fine tuning stack_llama #290

Closed imrankh46 closed 1 year ago

imrankh46 commented 1 year ago

Hi, everyone. My question is, the steps mentioned for fine tune Stack_llama model. So should I need to run all the step in once. ?

There were three main steps to the training process:

Supervised fine-tuning of the base llama-7b model to create llama-7b-se:

torchrun --nnodes 1  --nproc_per_node 8 examples/stack_llama/scripts/supervised_finetuning.py --model_path=<LLAMA_MODEL_PATH> --streaming --no_gradient_checkpointing --learning_rate 1e-5 --max_steps 5000 --output_dir ./llama-se

Reward modeling using dialog pairs from the SE dataset using the llama-7b-se to create llama-7b-se-rm:

torchrun --nnodes 1  --nproc_per_node 8 examples/stack_llama/scripts/reward_modeling.py --model_name=<LLAMA_SE_MODEL>
RL fine-tuning of llama-7b-se with the llama-7b-se-rm reward model:
accelerate launch --multi_gpu --num_machines 1  --num_processes 8 examples/stack_llama/scripts/rl_training.py --log_with=wandb --model_name=<LLAMA_SE_MODEL> --reward_model_name=<LLAMA_SE_RM_MODEL> --adafactor=False --tokenizer_name=<LLAMA_TOKENIZER> --save_freq=100 --output_max_length=128 --batch_size=8 --gradient_accumulation_steps=8 --batched_gen=True --ppo_epochs=4 --seed=0 --learning_rate=1.4e-5 --early_stopping=True --output_dir=llama-se-rl-finetune-128-8-8-1.4e-5_adam

LoRA layers were using at all stages to reduce memory requirements. At each stage the peft adapter layers were merged with the base model, using:
lvwerra commented 1 year ago

If you want to reproduce the full example, then yes. However, we also provide the adapter weights for each step on the Hub: https://huggingface.co/trl-lib so you could reuse them e.g. if you just want to run the last step.

imrankh46 commented 1 year ago

If you want to reproduce the full example, then yes. However, we also provide the adapter weights for each step on the Hub: https://huggingface.co/trl-lib so you could reuse them e.g. if you just want to run the last step.

My question is, should I need to run each steps individually or not . I have my dataset like stack_exchange one . 🤗

lvwerra commented 1 year ago

Yes, in that case you would need to run each step individually!

imrankh46 commented 1 year ago

Yes, in that case you would need to run each step individually!

Thank you for the response. 2nd question is, should I need to follow instructions base model ? Or I can train small model also.

lvwerra commented 1 year ago

I don't think I understand your question. Fine-tuning on StackExchange data should be ok in this example. If you already have a strong instruction model maybe fine-tuning step might not be necessary.

imrankh46 commented 1 year ago

If you want to reproduce the full example, then yes. However, we also provide the adapter weights for each step on the Hub: https://huggingface.co/trl-lib so you could reuse them e.g. if you just want to run the last step.

Here you mean that's, if I have a data set so I don't need to follow step one and Step two. Just run the last step.

Iam I correct?

lvwerra commented 1 year ago

No, I meant that if you want to reproduce the results on our dataset you can use the checkpoints. If you have your own dataset then you need to run all the steps.

imrankh46 commented 1 year ago

No, I meant that if you want to reproduce the results on our dataset you can use the checkpoints. If you have your own dataset then you need to run all the steps.

Thank you so much.

imrankh46 commented 1 year ago

No, I meant that if you want to reproduce the results on our dataset you can use the checkpoints. If you have your own dataset then you need to run all the steps.

i try to run the following code but they not working. i think you need to refresh the model page.

!torchrun --nnodes 1 --nproc_per_node 1 /content/trl/examples/stack_llama/scripts/supervised_finetuning.py --model_path=trl-lib/llama-7b-se-peft --streaming --no_gradient_checkpointing --learning_rate 1e-5 --max_steps 5000 --output_dir ./llama-se

OSError: trl-lib/llama-7b-se-peft does not appear to have a file named 
config.json. Checkout 'https://huggingface.co/trl-lib/llama-7b-se-peft/main' for
available files.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4607) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/content/trl/examples/stack_llama/scripts/supervised_finetuning.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-04-15_12:30:39
  host      : 5b5507893892
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 4607)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
lvwerra commented 1 year ago

I think for the supervised finetuning you need the full model, not just the adapters. You could adapt the code to load the pretrained model and load the adapters on top of it.

imrankh46 commented 1 year ago

I think for the supervised finetuning you need the full model, not just the adapters. You could adapt the code to load the pretrained model and load the adapters on top of it.

How I can do this ?

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.