Closed cfanbo closed 5 months ago
报错信息不全
报错信息不全
dos命令行只有这些信息
只有这些无法提供帮助
除了这个地方,哪里还可以看到日志信息? 另外在上面的日志里找到
05/04/2024 22:19:55 - INFO - llmtuner.data.template - Cannot add this chat template to tokenizer.
不知是否这个原因想起的?
使用命令行训练试试
结果一样的,这里只使用了 alpaca_gpt4_zh 数据集。 是否有可能与cache有关?
(llama_factory) C:\Users\Administrator\sxf_workspace\LLaMA-Factory>llamafactory-cli train --stage sft --do_train True --model_name_or_path THUDM/chatglm3-6b --finetuning_type lora --template chatglm3 --flash_attn auto --dataset_dir data --dataset alpaca_gpt4_zh --cutoff_len 1024 --learning_rate 5e-05 --num_train_epochs 3.0 --max_samples 100000 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --optim adamw_torch --packing False --report_to none --output_dir savesLM3-6B-Chat_2024-05-04-22-51-59 --fp16 True --lora_rank 8 --lora_alpha 16 --lora_dropout 0 --use_dora True --lora_target all --plot_loss
True
bin C:\ProgramData\Anaconda3\envs\llama_factory\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
05/04/2024 22:52:02 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
C:\ProgramData\Anaconda3\envs\llama_factory\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
[INFO|tokenization_utils_base.py:2087] 2024-05-04 22:52:22,488 >> loading file tokenizer.model from cache at C:\Users\Administrator\.cache\huggingface\hub\models--THUDM--chatglm3-6b\snapshots\103caa40027ebfd8450289ca2f278eac4ff26405\tokenizer.model
[INFO|tokenization_utils_base.py:2087] 2024-05-04 22:52:22,488 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2087] 2024-05-04 22:52:22,488 >> loading file special_tokens_map.json from cache at C:\Users\Administrator\.cache\huggingface\hub\models--THUDM--chatglm3-6b\snapshots\103caa40027ebfd8450289ca2f278eac4ff26405\special_tokens_map.json
[INFO|tokenization_utils_base.py:2087] 2024-05-04 22:52:22,488 >> loading file tokenizer_config.json from cache at C:\Users\Administrator\.cache\huggingface\hub\models--THUDM--chatglm3-6b\snapshots\103caa40027ebfd8450289ca2f278eac4ff26405\tokenizer_config.json
[INFO|tokenization_utils_base.py:2087] 2024-05-04 22:52:22,488 >> loading file tokenizer.json from cache at None
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
05/04/2024 22:52:22 - INFO - llmtuner.data.template - Add <|user|>,<|observation|> to stop words.
05/04/2024 22:52:22 - INFO - llmtuner.data.template - Cannot add this chat template to tokenizer.
05/04/2024 22:52:22 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_zh.json...
input_ids:
[64790, 64792, 64795, 30910, 13, 30910, 31983, 35959, 32474, 34128, 31155, 64796, 30910, 13, 30910, 49141, 31983, 35959, 32474, 34128, 31211, 13, 13, 30939, 30930, 30910, 31983, 31902, 31651, 31155, 32096, 54725, 40215, 31902, 31903, 31123, 54627, 40657, 31201, 38187, 54746, 35384, 31123, 54558, 32079, 38771, 31740, 31123, 32316, 34779, 31996, 31123, 54724, 35434, 32382, 36490, 31155, 13, 13, 30943, 30930, 30910, 37167, 33296, 31155, 32096, 33777, 47049, 33908, 31201, 34396, 31201, 54580, 55801, 54679, 54542, 34166, 34446, 41635, 35471, 32445, 31123, 32317, 54589, 55611, 31201, 54589, 34166, 54542, 33185, 32357, 31123, 54548, 31983, 35959, 49339, 31155, 13, 13, 30966, 30930, 30910, 34192, 35285, 31155, 34192, 48191, 31740, 44323, 31123, 35315, 32096, 54720, 32444, 30910, 30981, 30941, 30973, 30910, 44442, 34192, 31155, 32775, 34192, 35434, 35763, 32507, 31123, 32079, 31902, 32683, 31123, 54724, 31803, 31937, 34757, 49510, 31155, 2]
inputs:
[gMASK] sop <|user|>
保持健康的三个提示。 <|assistant|>
以下是保持健康的三个提示:
1. 保持身体活动。每天做适当的身体运动,如散步、跑步或游泳,能促进心血管健康,增强肌肉力量,并有助于减少体重。
2. 均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物,避免高糖、高脂肪和加工食品,以保持健康的饮食习惯。
3. 睡眠充足。睡眠对人体健康至关重要,成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力,促进身体恢复,并提高注意力和记忆力。
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 30910, 13, 30910, 49141, 31983, 35959, 32474, 34128, 31211, 13, 13, 30939, 30930, 30910, 31983, 31902, 31651, 31155, 32096, 54725, 40215, 31902, 31903, 31123, 54627, 40657, 31201, 38187, 54746, 35384, 31123, 54558, 32079, 38771, 31740, 31123, 32316, 34779, 31996, 31123, 54724, 35434, 32382, 36490, 31155, 13, 13, 30943, 30930, 30910, 37167, 33296, 31155, 32096, 33777, 47049, 33908, 31201, 34396, 31201, 54580, 55801, 54679, 54542, 34166, 34446, 41635, 35471, 32445, 31123, 32317, 54589, 55611, 31201, 54589, 34166, 54542, 33185, 32357, 31123, 54548, 31983, 35959, 49339, 31155, 13, 13, 30966, 30930, 30910, 34192, 35285, 31155, 34192, 48191, 31740, 44323, 31123, 35315, 32096, 54720, 32444, 30910, 30981, 30941, 30973, 30910, 44442, 34192, 31155, 32775, 34192, 35434, 35763, 32507, 31123, 32079, 31902, 32683, 31123, 54724, 31803, 31937, 34757, 49510, 31155, 2]
labels:
以下是保持健康的三个提示:
1. 保持身体活动。每天做适当的身体运动,如散步、跑步或游泳,能促进心血管健康,增强肌肉力量,并有助于减少体重。
2. 均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物,避免高糖、高脂肪和加工食品,以保持健康的饮食习惯。
3. 睡眠充足。睡眠对人体健康至关重要,成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力,促进身体恢复,并提高注意力和记忆力。
[INFO|configuration_utils.py:726] 2024-05-04 22:52:33,744 >> loading configuration file config.json from cache at C:\Users\Administrator\.cache\huggingface\hub\models--THUDM--chatglm3-6b\snapshots\103caa40027ebfd8450289ca2f278eac4ff26405\config.json
[INFO|configuration_utils.py:726] 2024-05-04 22:52:53,764 >> loading configuration file config.json from cache at C:\Users\Administrator\.cache\huggingface\hub\models--THUDM--chatglm3-6b\snapshots\103caa40027ebfd8450289ca2f278eac4ff26405\config.json
[INFO|configuration_utils.py:789] 2024-05-04 22:52:53,765 >> Model config ChatGLMConfig {
"_name_or_path": "THUDM/chatglm3-6b",
"add_bias_linear": false,
"add_qkv_bias": true,
"apply_query_key_layer_scaling": true,
"apply_residual_connection_post_layernorm": false,
"architectures": [
"ChatGLMModel"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"auto_map": {
"AutoConfig": "THUDM/chatglm3-6b--configuration_chatglm.ChatGLMConfig",
"AutoModel": "THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForCausalLM": "THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSequenceClassification": "THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForSequenceClassification"
},
"bias_dropout_fusion": true,
"classifier_dropout": null,
"eos_token_id": 2,
"ffn_hidden_size": 13696,
"fp32_residual_connection": false,
"hidden_dropout": 0.0,
"hidden_size": 4096,
"kv_channels": 128,
"layernorm_epsilon": 1e-05,
"model_type": "chatglm",
"multi_query_attention": true,
"multi_query_group_num": 2,
"num_attention_heads": 32,
"num_layers": 28,
"original_rope": true,
"pad_token_id": 0,
"padded_vocab_size": 65024,
"post_layer_norm": true,
"pre_seq_len": null,
"prefix_projection": false,
"quantization_bit": 0,
"rmsnorm": true,
"seq_length": 8192,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.40.1",
"use_cache": true,
"vocab_size": 65024
}
[INFO|modeling_utils.py:3429] 2024-05-04 22:53:03,848 >> loading weights file model.safetensors from cache at C:\Users\Administrator\.cache\huggingface\hub\models--THUDM--chatglm3-6b\snapshots\103caa40027ebfd8450289ca2f278eac4ff26405\model.safetensors.index.json
[INFO|modeling_utils.py:1494] 2024-05-04 22:53:03,852 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.float16.
[INFO|configuration_utils.py:928] 2024-05-04 22:53:03,853 >> Generate config GenerationConfig {
"eos_token_id": 2,
"pad_token_id": 0
}
(llama_factory) C:\Users\Administrator\sxf_workspace\LLaMA-Factory>
怀疑是显存不足的原因,但又没有任何错误信息。后来换了一台 22G 显存的机器训练正常的。
Reminder
Reproduction
个人第一次接触这一块的知识
这里是PC机器,RTX 3070 显卡 16G ,训练时提示“Failed.” 结果,导致 "Loss" 无任何数据。
以下是通过网页生成的最终执行脚本
终端显示的信息
Expected behavior
Loss 有数据渲染
System Info