Parameter setting for training Mistral

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Apache License 2.0

36.64k stars 4.52k forks source link

Parameter setting for training Mistral #2818

Open xiaocaijiayou opened 10 months ago

xiaocaijiayou commented 10 months ago

Do I need to make any changes to the following training parameters if I am training Mistral?

torchrun --nproc_per_node=1 --master_port=20001 fastchat/train/train_mem.py \ --model_name_or_path /local/Mistral-7B-v0.1 \ --data_path data/dummy_conversation.json \ --bf16 True \ --output_dir result \ --num_train_epochs 3 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_steps 16 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 1200 \ --save_total_limit 10 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --fsdp "full_shard auto_wrap" \ --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True

surak commented 10 months ago

I didn't manage to get it running yet. Please report your success or failure, let's try this together!

BeastyZ commented 9 months ago

It works for me!

torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \
    --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
    --data_path fine_tuning/data/train/medi_fastchat.json \
    --bf16 True \
    --output_dir fine_tuning/model/mistral-7b-0103 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'MistralDecoderLayer' \
    --tf32 True \
    --model_max_length 1024 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --report_to none

surak commented 9 months ago

@BeastyZ Would you mind sharing your medi_fastchat.json file?

BeastyZ commented 9 months ago

@surak medi_fastchat.json is my custom dataset, you can use your own data. I'm sorry that I can't share it with you.

surak commented 9 months ago

I understand that. Would you share a couple examples of it, so one could make their own? It's more than anything about the format that pleases fastchat than anything. No need for your full private data :-)

BeastyZ commented 9 months ago

My data format refers to this