Open xiaocaijiayou opened 10 months ago
I didn't manage to get it running yet. Please report your success or failure, let's try this together!
It works for me!
torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \
--model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
--data_path fine_tuning/data/train/medi_fastchat.json \
--bf16 True \
--output_dir fine_tuning/model/mistral-7b-0103 \
--num_train_epochs 1 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 16 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'MistralDecoderLayer' \
--tf32 True \
--model_max_length 1024 \
--gradient_checkpointing True \
--lazy_preprocess True \
--report_to none
@BeastyZ Would you mind sharing your medi_fastchat.json
file?
@surak medi_fastchat.json
is my custom dataset, you can use your own data. I'm sorry that I can't share it with you.
I understand that. Would you share a couple examples of it, so one could make their own? It's more than anything about the format that pleases fastchat than anything. No need for your full private data :-)
Do I need to make any changes to the following training parameters if I am training Mistral?
torchrun --nproc_per_node=1 --master_port=20001 fastchat/train/train_mem.py \ --model_name_or_path /local/Mistral-7B-v0.1 \ --data_path data/dummy_conversation.json \ --bf16 True \ --output_dir result \ --num_train_epochs 3 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_steps 16 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 1200 \ --save_total_limit 10 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --fsdp "full_shard auto_wrap" \ --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True