[INFO|modeling_utils.py:3553] 2024-11-26 11:15:33,752 >> loading weights file /data/models/Qwen2.5-14B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:3698] 2024-11-26 11:15:33,753 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[2024-11-26 11:15:33,753] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[WARNING|logging.py:328] 2024-11-26 11:15:33,755 >> You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
[WARNING|logging.py:328] 2024-11-26 11:15:33,755 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2024-11-26 11:15:33,762 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[INFO|configuration_utils.py:1000] 2024-11-26 11:15:33,762 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645,
"use_cache": false
}
[WARNING|logging.py:328] 2024-11-26 11:15:33,763 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[2024-11-26 11:15:33,906] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[2024-11-26 11:15:33,906] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
worker log:
[INFO|modeling_utils.py:3553] 2024-11-26 11:15:33,573 >> loading weights file /data/models/Qwen2.5-14B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:3698] 2024-11-26 11:15:33,573 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[2024-11-26 11:15:33,574] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[WARNING|logging.py:328] 2024-11-26 11:15:33,576 >> You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
[WARNING|logging.py:328] 2024-11-26 11:15:33,576 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2024-11-26 11:15:33,582 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[INFO|configuration_utils.py:1000] 2024-11-26 11:15:33,582 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645,
"use_cache": false
}
[WARNING|logging.py:328] 2024-11-26 11:15:33,583 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
manager
CUDA_VISIBLE_DEVICES=0,1,2 FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.12.2 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen2_lora_dpo.yaml
worker
FORCE_TORCHRUN=1 NNODES=2 RANK=1 MASTER_ADDR=192.168.12.2 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen2_lora_dpo.yaml
Expected behavior
No response
Others
qwen2_lora_dpo.yaml
manager log:
worker log: