lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
35.56k stars 4.37k forks source link

Train with system prompt #3054

Open christobill opened 4 months ago

christobill commented 4 months ago

When using:

torchrun --nproc_per_node=2 --master_port=20001 fastchat/train/train.py \
    --model_name_or_path lmsys/vicuna-7b-v1.5 \
    --data_path data/dummy_conversation.json \
    --bf16 True \
    --output_dir output_vicuna \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 10 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True

The fine-tuning is processed with the following system prompt/message: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."

It would be interesting to specify the prompt to be used, This what it would look like, inspired by https://github.com/lm-sys/FastChat/blob/main/data/dummy_conversation.json no prompt would fallback to default prompt:

[
  {
    "id": "identity_0",
    "system_message": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant name is Vick.",
    "conversations": [
      {
        "from": "human",
        "value": "Who are you?"
      },
      {
        "from": "gpt",
        "value": "I am Vick, a language model trained by researchers from Large Model Systems Organization (LMSYS)."
      },
      {
        "from": "human",
        "value": "Have a nice day!"
      },
      {
        "from": "gpt",
        "value": "You too!"
      }
    ]
  },
  {
    "id": "identity_1",
    "conversations": [
      {
        "from": "human",
        "value": "Who are you?"
      },
      {
        "from": "gpt",
        "value": "My name is Vicuna, and I'm a language model developed by Large Model Systems Organization (LMSYS)."
      }
    ]
  }
  ]

I can do a pull request for this, if this is considered an interesting feature

christobill commented 4 months ago

It seems this is exactly what is done in train_with_template here: https://github.com/lm-sys/FastChat/blob/3bef934b9da68e3e9a8decee472e484cca4df1ad/fastchat/train/train_with_template.py#L238

This file was added recently by this PR: https://github.com/lm-sys/FastChat/pull/2951 by @congchan :muscle: but no documentation about it. Willing to give a hand here if needed :smiley:

congchan commented 3 months ago

It seems this is exactly what is done in train_with_template here:

https://github.com/lm-sys/FastChat/blob/3bef934b9da68e3e9a8decee472e484cca4df1ad/fastchat/train/train_with_template.py#L238

This file was added recently by this PR: #2951 by @congchan 💪 but no documentation about it. Willing to give a hand here if needed 😃

Hi, it will be great if you can help on documenting!😄