Closed xaiocaibi closed 2 days ago
llamafactory version: 0.8.3.dev0 Platform: Linux-5.4.210-4-velinux1-amd64-x86_64-with-glibc2.31 Python version: 3.10.12 PyTorch version: 2.1.0+cu118 (GPU) Transformers version: 4.42.3 Datasets version: 2.20.0 Accelerate version: 0.30.1 PEFT version: 0.11.1 TRL version: 0.8.6 GPU type: NVIDIA A800-SXM4-80GB DeepSpeed version: 0.13.1 Bitsandbytes version: 0.42.0
model_name_or_path: /ML-A100/team/align/public/models/Yi-34B-Chat-0205
stage: rm do_train: true finetuning_type: full deepspeed: examples/deepspeed/ds_z3_config.json
dataset: breaking_chat_zh_en template: yi cutoff_len: 2048 overwrite_cache: true preprocessing_num_workers: 16 use_fast_tokenizer: False
output_dir: saves/Yi-34b/full/reward logging_steps: 10 save_steps: 3000 plot_loss: true overwrite_output_dir: true
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-7 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000
val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 1000
report_to: tensorboard
logging_dir: <%= ENV["TENSORBOARD_LOG_PATH"] %>
需要配置tensorboard的路径,路径会在训练启动的时候从环境变量中获取,无法写成绝对值
No response
https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.logging_dir
Reminder
System Info
llamafactory version: 0.8.3.dev0 Platform: Linux-5.4.210-4-velinux1-amd64-x86_64-with-glibc2.31 Python version: 3.10.12 PyTorch version: 2.1.0+cu118 (GPU) Transformers version: 4.42.3 Datasets version: 2.20.0 Accelerate version: 0.30.1 PEFT version: 0.11.1 TRL version: 0.8.6 GPU type: NVIDIA A800-SXM4-80GB DeepSpeed version: 0.13.1 Bitsandbytes version: 0.42.0
Reproduction
model
model_name_or_path: /ML-A100/team/align/public/models/Yi-34B-Chat-0205
method
stage: rm do_train: true finetuning_type: full deepspeed: examples/deepspeed/ds_z3_config.json
dataset
dataset: breaking_chat_zh_en template: yi cutoff_len: 2048 overwrite_cache: true preprocessing_num_workers: 16 use_fast_tokenizer: False
output
output_dir: saves/Yi-34b/full/reward logging_steps: 10 save_steps: 3000 plot_loss: true overwrite_output_dir: true
train
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-7 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000
eval
val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 1000
log
report_to: tensorboard
logging_dir: ${TENSORBOARD_LOG_PATH}
logging_dir: <%= ENV["TENSORBOARD_LOG_PATH"] %>
Expected behavior
需要配置tensorboard的路径,路径会在训练启动的时候从环境变量中获取,无法写成绝对值
Others
No response