01/13/2024 15:19:09 - INFO - llmtuner.model.utils - Failed to load model.safetensors: /LORA_CHECKPOINT_PATH does not appear to have a file named model.safetensors. Checkout 'https://huggingface.co//LORA_CHECKPOINT_PATH/None' for available files.
01/13/2024 15:19:09 - INFO - llmtuner.model.utils - Failed to load pytorch_model.bin: /LORA_CHECKPOINT_PATH does not appear to have a file named pytorch_model.bin. Checkout 'https://huggingface.co//LORA_CHECKPOINT_PATH' for available files.
01/13/2024 15:19:09 - WARNING - llmtuner.model.utils - Provided path (LORA_CHECKPOINT_PATH) does not contain valuehead weights.
01/13/2024 15:19:09 - INFO - llmtuner.model.loader - trainable params: 6558721 || all params: 13903226881 || trainable%: 0.0472
input_ids:
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:05<00:05, 5.44s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00, 3.98s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00, 4.20s/it]
01/13/2024 15:19:18 - INFO - llmtuner.model.adapter - Adapter is not found at evaluation, load the base model.
01/13/2024 15:19:18 - INFO - llmtuner.model.utils - Failed to load model.safetensors: RM_PATH_LORA_EXPORTED does not appear to have a file named model.safetensors. Checkout 'https://huggingface.co//RM_PATH_LORA_EXPORTED/None' for available files.
01/13/2024 15:19:18 - INFO - llmtuner.model.utils - Failed to load pytorch_model.bin: RM_PATH_LORA_EXPORTED does not appear to have a file named pytorch_model.bin. Checkout 'https://huggingface.co//RM_PATH_LORA_EXPORTED/None' for available files.
是因为参数bin文件过大拆成两个之后就无法读取么?
另外还有一个疑问
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model.
这段信息要如何处理呢,是要修改模型的某个文件么?_set_gradient_checkpointing这个方法在哪里呢?我有试着重新训练SFT、RM等,但是在新代码的新训练过程还会出现这个信息。希望能获得您的进一步解答,非常感谢
Reminder
Reproduction
OUTPUT= OUTPUT_PATH LR=1e-6 mkdir -p $OUTPUT
CUDA_VISIBLE_DEVICES='3' python src/train_bash.py \ --stage ppo \ --do_train \ --model_name_or_path BASE_MODEL_PATH \ --adapter_name_or_path LORA_CHECKPOINT_PATH \ --create_new_adapter \ --dataset step3_train \ --template baichuan2 \ --finetuning_type lora \ --lora_target W_pack \ --reward_model_type full \ --reward_model RM_PATH_LORA_EXPORTED \ --output_dir $OUTPUT \ --overwrite_output_dir True \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --top_k 0 \ --top_p 0.9 \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate $LR \ --num_train_epochs 1.0 \ --plot_loss \ --overwrite_output_dir \ --bf16 \ 2>&1 | tee $OUTPUT/training.log
Expected behavior
想要训练PPO,但是在这个过程发现INFO有些异常。看到issue中有类似问题,解决为更新代码到最新,更新到LLaMA-Factory0.4.0版本,仍出现上述问题。
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformers
version: 4.36.2文件信息
LORA_CHECKPOINT_PATH ├── adapter_config.json ├── adapter_model.bin ├── optimizer.pt ├── README.md ├── rng_state.pth ├── scheduler.pt ├── special_tokens_map.json ├── tokenization_baichuan.py ├── tokenizer_config.json ├── tokenizer.model ├── trainer_state.json └── training_args.bin RM_PATH_LORA_EXPORTED ├── config.json ├── configuration_baichuan.py ├── generation_config.json ├── generation_utils.py ├── modeling_baichuan.py ├── pytorch_model-00001-of-00002.bin ├── pytorch_model-00002-of-00002.bin ├── pytorch_model.bin.index.json ├── quantizer.py ├── special_tokens_map.json ├── tokenization_baichuan.py ├── tokenizer_config.json └── tokenizer.model
Others
报错信息:
01/13/2024 15:19:09 - INFO - llmtuner.model.utils - Failed to load model.safetensors: /LORA_CHECKPOINT_PATH does not appear to have a file named model.safetensors. Checkout 'https://huggingface.co//LORA_CHECKPOINT_PATH/None' for available files. 01/13/2024 15:19:09 - INFO - llmtuner.model.utils - Failed to load pytorch_model.bin: /LORA_CHECKPOINT_PATH does not appear to have a file named pytorch_model.bin. Checkout 'https://huggingface.co//LORA_CHECKPOINT_PATH' for available files. 01/13/2024 15:19:09 - WARNING - llmtuner.model.utils - Provided path (LORA_CHECKPOINT_PATH) does not contain valuehead weights. 01/13/2024 15:19:09 - INFO - llmtuner.model.loader - trainable params: 6558721 || all params: 13903226881 || trainable%: 0.0472 input_ids:
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:05<00:05, 5.44s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00, 3.98s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00, 4.20s/it] 01/13/2024 15:19:18 - INFO - llmtuner.model.adapter - Adapter is not found at evaluation, load the base model. 01/13/2024 15:19:18 - INFO - llmtuner.model.utils - Failed to load model.safetensors: RM_PATH_LORA_EXPORTED does not appear to have a file named model.safetensors. Checkout 'https://huggingface.co//RM_PATH_LORA_EXPORTED/None' for available files. 01/13/2024 15:19:18 - INFO - llmtuner.model.utils - Failed to load pytorch_model.bin: RM_PATH_LORA_EXPORTED does not appear to have a file named pytorch_model.bin. Checkout 'https://huggingface.co//RM_PATH_LORA_EXPORTED/None' for available files.
是因为参数bin文件过大拆成两个之后就无法读取么?
另外还有一个疑问