zhaoxjmail commented 3 days ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.31
Python version: 3.11.9
PyTorch version: 2.3.0+cu121 (GPU)
Transformers version: 4.42.3
Datasets version: 2.20.0
Accelerate version: 0.31.0
PEFT version: 0.11.1
TRL version: 0.9.4
GPU type: NVIDIA A800 80GB PCIe
DeepSpeed version: 0.15.1
Bitsandbytes version: 0.43.1
vLLM version: 0.5.0

Reproduction

manager

CUDA_VISIBLE_DEVICES=0,1,2 FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.12.2 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen2_lora_dpo.yaml

worker

FORCE_TORCHRUN=1 NNODES=2 RANK=1 MASTER_ADDR=192.168.12.2 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen2_lora_dpo.yaml

Expected behavior

No response

Others

qwen2_lora_dpo.yaml

### model
model_name_or_path: /data/models/Qwen2.5-14B-Instruct
#quantization_bit: 8

### method
stage: dpo
do_train: true
finetuning_type: full
lora_target: all
pref_beta: 0.1
pref_loss: orpo  # choices: [sigmoid (dpo), orpo, simpo]
deepspeed: examples/deepspeed/ds_z3_config.json
lora_rank: 256 
lora_dropout: 0.1

### dataset
dataset: qwen_dpo_augmentation
template: qwen
cutoff_len: 2048
max_samples: 1000000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: /data/models/qwen2.5/dpo-14b_augmentation_fsdp
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
flash_attn: fa2
#enable_liger_kernel: True

### ddp_backend
ddp_backend: nccl
ddp_find_unused_parameters: false  
### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

manager log:

[INFO|modeling_utils.py:3553] 2024-11-26 11:15:33,752 >> loading weights file /data/models/Qwen2.5-14B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:3698] 2024-11-26 11:15:33,753 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[2024-11-26 11:15:33,753] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[WARNING|logging.py:328] 2024-11-26 11:15:33,755 >> You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
[WARNING|logging.py:328] 2024-11-26 11:15:33,755 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2024-11-26 11:15:33,762 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[INFO|configuration_utils.py:1000] 2024-11-26 11:15:33,762 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "use_cache": false
}

[WARNING|logging.py:328] 2024-11-26 11:15:33,763 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[2024-11-26 11:15:33,906] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[2024-11-26 11:15:33,906] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`

worker log:

[INFO|modeling_utils.py:3553] 2024-11-26 11:15:33,573 >> loading weights file /data/models/Qwen2.5-14B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:3698] 2024-11-26 11:15:33,573 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[2024-11-26 11:15:33,574] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[WARNING|logging.py:328] 2024-11-26 11:15:33,576 >> You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
[WARNING|logging.py:328] 2024-11-26 11:15:33,576 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2024-11-26 11:15:33,582 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[INFO|configuration_utils.py:1000] 2024-11-26 11:15:33,582 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "use_cache": false
}

[WARNING|logging.py:328] 2024-11-26 11:15:33,583 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`

hiyouga commented 3 days ago

看一下最新的说明，可能有些环境变量要换 https://github.com/hiyouga/LLaMA-Factory/tree/main/examples#supervised-fine-tuning-on-multiple-nodes

zhaoxjmail commented 1 day ago