Closed bird-9 closed 4 months ago
A100[40G] * 8
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \ --config_file examples/accelerate/fsdp_config.yaml \ src/train_bash.py \ --stage sft \ --do_train \ --model_name_or_path /opt/models/Qwen-72B-Chat-Int4 \ --dataset sgpt \ --dataset_dir data \ --template qwen \ --finetuning_type lora \ --lora_target q_proj,v_proj \ --output_dir saves/Qwen-72B-Chat-Int4/lora/sft \ --overwrite_cache \ --overwrite_output_dir \ --cutoff_len 1024 \ --preprocessing_num_workers 16 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --warmup_steps 20 \ --save_steps 100 \ --eval_steps 100 \ --evaluation_strategy steps \ --load_best_model_at_end \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --max_samples 3000 \ --val_size 0.1 \ --ddp_timeout 180000000 \ --quantization_bit 4 \ --plot_loss \ --fp16 \ --ddp_find_unused_parameters false \ --upcast_layernorm true
compute_environment: LOCAL_MACHINE debug: false distributed_type: FSDP downcast_bf16: 'no' fsdp_config: fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP fsdp_backward_prefetch: BACKWARD_PRE fsdp_cpu_ram_efficient_loading: true fsdp_forward_prefetch: false fsdp_offload_params: true fsdp_sharding_strategy: FULL_SHARD fsdp_state_dict_type: FULL_STATE_DICT fsdp_sync_module_states: true fsdp_use_orig_params: false machine_rank: 0 main_training_function: main mixed_precision: fp16 num_machines: 1 # the number of nodes num_processes: 8 # the number of GPUs in all nodes rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false gpu_ids: all
No response
FO|configuration_utils.py:724] 2024-04-19 13:59:20,332 >> loading configuration file /opt/models/Qwen-72B-Chat-Int4/config.json [INFO|configuration_utils.py:724] 2024-04-19 13:59:20,337 >> loading configuration file /opt/models/Qwen-72B-Chat-Int4/config.json [INFO|configuration_utils.py:789] 2024-04-19 13:59:20,340 >> Model config QWenConfig { "_name_or_path": "/opt/models/Qwen-72B-Chat-Int4", "architectures": [ "QWenLMHeadModel" ], "attn_dropout_prob": 0.0, "auto_map": { "AutoConfig": "configuration_qwen.QWenConfig", "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel" }, "bf16": false, "emb_dropout_prob": 0.0, "fp16": true, "fp32": false, "hidden_size": 8192, "initializer_range": 0.02, "intermediate_size": 49152, "kv_channels": 128, "layer_norm_epsilon": 1e-06, "max_position_embeddings": 32768, "model_type": "qwen", "no_bias": true, "num_attention_heads": 64, "num_hidden_layers": 80, "onnx_safe": null, "quantization_config": { "bits": 4, "damp_percent": 0.01, "desc_act": false, "group_size": 128, "model_file_base_name": "model", "model_name_or_path": null, "quant_method": "gptq", "static_groups": false, "sym": true, "true_sequential": true }, "rope_theta": 1000000, "rotary_emb_base": 1000000, "rotary_pct": 1.0, "scale_attn_weights": true, "seq_length": 32768, "softmax_in_fp32": false, "tie_word_embeddings": false, "tokenizer_class": "QWenTokenizer", "transformers_version": "4.39.3", "use_cache": true, "use_cache_kernel": false, "use_cache_quantization": false, "use_dynamic_ntk": false, "use_flash_attn": "auto", "use_logn_attn": false, "vocab_size": 152064 } 04/19/2024 13:59:20 - INFO - llmtuner.model.patcher - Loading 4-bit GPTQ-quantized model. CUDA extension not installed. CUDA extension not installed. [INFO|modeling_utils.py:3280] 2024-04-19 13:59:20,507 >> loading weights file /opt/models/Qwen-72B-Chat-Int4/model.safetensors.index.json [INFO|modeling_utils.py:1417] 2024-04-19 13:59:20,508 >> Instantiating QWenLMHeadModel model under default dtype torch.float16. [INFO|configuration_utils.py:928] 2024-04-19 13:59:20,509 >> Generate config GenerationConfig {} /root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead warnings.warn( Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:14<00:00, 5.36 examples/s] Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:14<00:00, 5.55 examples/s]04/19/2024 13:59:36 - INFO - llmtuner.model.patcher - Loading 4-bit GPTQ-quantized model. Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:14<00:00, 5.34 examples/s] Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:15<00:00, 5.58 examples/s]04/19/2024 13:59:36 - INFO - llmtuner.model.patcher - Loading 4-bit GPTQ-quantized model. CUDA extension not installed. CUDA extension not installed. Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:15<00:00, 5.29 examples/s] CUDA extension not installed. CUDA extension not installed. Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:15<00:00, 5.26 examples/s] 04/19/2024 13:59:37 - INFO - llmtuner.model.patcher - Loading 4-bit GPTQ-quantized model. 04/19/2024 13:59:37 - INFO - llmtuner.model.patcher - Loading 4-bit GPTQ-quantized model. Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:15<00:00, 5.24 examples/s] Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:15<00:00, 5.35 examples/s]04/19/2024 13:59:37 - INFO - llmtuner.model.patcher - Loading 4-bit GPTQ-quantized model. CUDA extension not installed. CUDA extension not installed. Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:15<00:00, 5.21 examples/s] CUDA extension not installed. CUDA extension not installed. 04/19/2024 13:59:37 - INFO - llmtuner.model.patcher - Loading 4-bit GPTQ-quantized model. CUDA extension not installed. CUDA extension not installed. Loading checkpoint shards: 24%|████████████████████████████▌ | 5/21 [00:03<00:09, 1.69it/s]/root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead warnings.warn( Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:15<00:00, 5.08 examples/s]CUDA extension not installed. CUDA extension not installed. /root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead warnings.warn( Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:15<00:00, 5.11 examples/s] 04/19/2024 13:59:37 - INFO - llmtuner.model.patcher - Loading 4-bit GPTQ-quantized model. /root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead warnings.warn( CUDA extension not installed. CUDA extension not installed. /root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead warnings.warn( /root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead warnings.warn( Loading checkpoint shards: 29%|██████████████████████████████████▎ | 6/21 [00:03<00:08, 1.73it/s]/root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead warnings.warn( /root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py:4225: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead warnings.warn( Loading checkpoint shards: 95%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 20/21 [00:11<00:00, 1.80it/s] Traceback (most recent call last): File "/opt/code/LLaMA-Factory/src/train_bash.py", line 14, in <module> main() File "/opt/code/LLaMA-Factory/src/train_bash.py", line 5, in main run_exp() File "/opt/code/LLaMA-Factory/src/llmtuner/train/tuner.py", line 33, in run_exp run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/opt/code/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 33, in run_sft model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train) File "/opt/code/LLaMA-Factory/src/llmtuner/model/loader.py", line 101, in load_model model: "PreTrainedModel" = AutoModelForCausalLM.from_pretrained(**init_kwargs) File "/root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained return model_class.from_pretrained( File "/root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3531, in from_pretrained ) = cls._load_pretrained_model( File "/root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3958, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 812, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/root/miniconda3/envs/LLaMaFactory/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 387, in set_module_tensor_to_device new_value = value.to(device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB. GPU 0 has a total capacity of 39.39 GiB of which 1.71 GiB is free. Including non-PyTorch memory, this process has 37.66 GiB memory in use. Of the allocated memory 36.12 GiB is allocated by PyTorch, and 148.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.ht
使用fsdp_lora训练但在在Loading checkpoint shards 的时候我只看到第一张卡的显存在增长 直到爆显存不足
已解决 看到别的issue说不支持训练量化版本!
仅支持未量化模型 + quantization_bit 参数
Reminder
Reproduction
配置
训练命令
配置文件
Expected behavior
No response
System Info
使用fsdp_lora训练但在在Loading checkpoint shards 的时候我只看到第一张卡的显存在增长 直到爆显存不足
Others
No response