Open alexhmyang opened 1 year ago
peft==0.3.0.dev0
pip install git+https://github.com/huggingface/peft
peft==0.3.0.dev0
pip install git+https://github.com/huggingface/peft
fatal: unable to access 'https://github.com/huggingface/peft/': Could not resolve host: github.com error: subprocess-exited-with-error
× git clone --filter=blob:none --quiet https://github.com/huggingface/peft 'C:\Users\86187\AppData\Local\Temp\pip-req-build-n5rknw77' did not run successfully. │ exit code: 128 ╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error
× git clone --filter=blob:none --quiet https://github.com/huggingface/peft 'C:\Users\86187\AppData\Local\Temp\pip-req-build-n5rknw77' did not run successfully. │ exit code: 128 ╰─> See above for output.
能否直接给个 能用的命令
pip install git+https://github.com/huggingface/peft
安装0.3.0.dev0后运行chatglm-6b-belle-zh-lora示例报了如下错误: RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.transformer.layers.0.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.0.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.1.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.1.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.2.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.2.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.3.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.3.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.4.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.4.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.5.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.5.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.6.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.6.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.7.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.7.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.8.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.8.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.9.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.9.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.10.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.10.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.11.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.11.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.12.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.12.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.13.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.13.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.14.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.14.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.15.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.15.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.16.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.16.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.17.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.17.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.18.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.18.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.19.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.19.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.20.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.20.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.21.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.21.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.22.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.22.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.23.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.23.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.24.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.24.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.25.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.25.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.26.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.26.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]). size mismatch for base_model.model.transformer.layers.27.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16, 4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.transformer.layers.27.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([8192, 8, 1]) from checkpoint, the shape in current model is torch.Size([12288, 8]).
chatglm-6b-belle-zh-lora 权重我没更新,可以自己训练,或者用shibing624/chatglm-6b-csc-zh-lora
chatglm-6b-belle-zh-lora 权重我没更新,可以自己训练,或者用shibing624/chatglm-6b-csc-zh-lora
最新的代码跑train也有报错
/DaTa/.local/home/hai.li/mambaforge/lib/python3.10/site-packages/torch/_dynamo/variables/builder │
│ .py:812 in wrap_fx_proxy_cls │
│ │
│ 809 │ │ │ │ "ignore_subclass": ignore_subclass, │
│ 810 │ │ │ │ "is_tensor": target_cls is TensorVariable, │
│ 811 │ │ │ } │
│ ❱ 812 │ │ │ assert "source" in options and options["source"] is not None │
│ 813 │ │ │ kwargs["source"] = options["source"] │
│ 814 │ │ │ example_value = wrap_to_fake_tensor_and_record( │
│ 815 │ │ │ │ example_value, tx=tx, **kwargs │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError:
from user code:
File "/DaTa/.local/home/hai.li/mambaforge/lib/python3.10/site-packages/torch/random.py", line 23, in get_rng_state
return default_generator.get_state()
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
完整的log如下
$ python training_chatglm_demo.py --do_train
2023-04-19 19:41:46.611 | INFO | __main__:main:43 - Namespace(train_file='../data/zh_csc_train.tsv', test_file='../data/zh_csc_test.tsv', model_type='chatglm', model_name='THUDM/chatglm-6b', do_train=True, do_predict=False, output_dir='./outputs/', max_seq_length=128, max_length=128, num_epochs=0.2, batch_size=2)
2023-04-19 19:41:46.611 | INFO | __main__:main:47 - Loading data...
2023-04-19 19:41:46.612 | DEBUG | textgen.chatglm.chatglm_model:__init__:91 - Device: cuda
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 8/8 [00:10<00:00, 1.29s/it]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
2023-04-19 19:42:09.099 | DEBUG | __main__:main:62 - train_data: [['对下面中文拼写纠错:', '对台湾的大学制度和社会血管而言,学生要工作的话很难,要辨读大学边工作的话,这会逼迫学生工作和学习上分心,让学生陷于力不从心精神分散的恶境。', '对台湾的大学制度和社会血管而言,学生要工作的话很难,要边读大学边工作的话,这会逼迫学生工作和学习上分心,让学生陷于力不从心精神分散的恶境。'], ['对下面中文拼写纠错:', '而大众对于其好坏方面的比例又不同的判断,所以对其的态度也完全不一至。', '而大众对于其好坏方面的比例有不同的判断,所以对其的态度也完全不一致。'], ['对下面中文拼写纠错:', '怎么办!我的房子里大学很远!时间不够了!', '怎么办!我的房子离大学很远!时间不够了!'], ['对下面中文拼写纠错:', '所以老师们应该这导最好交孩子的方法就是让他们玩儿而发展。', '所以老师们应该知道最好教孩子的方法就是让他们玩儿而发展。'], ['对下面中文拼写纠错:', '搭进第二十一世纪,顺著社会、科学的进步,网路科学也不断地发展同时电脑领域也速步地更新。', '踏进第二十一世纪,顺著社会、科学的进步,网路科学也不断地发展同时电脑领域也速步地更新。'], ['对下面中文拼写纠错:', '因为现在,我们再得这一时代,就是不能相信别人家的很冷淡的时代嘛!', '因为现在,我们在的这一时代,就是不能相信别人家的很冷淡的时代嘛!'], ['对下面中文拼写纠错:', '好可惜我下个礼拜要回国,我已经买过飞机票所以没办法那天跟你们一起庆祝你们的寰麟。', '好可惜我下个礼拜要回国,我已经买过飞机票所以没办法那天跟你们一起庆祝你们的婚礼。'], ['对下面中文拼写纠错:', '请你先不要放弃!你可以利用在家理的时间想一想你未来最想要做的是什么?', '请你先不要放弃!你可以利用在家里的时间想一想你未来最想要做的是什么?'], ['对下面中文拼写纠错:', '「宠物出租」我看在都市区会受欢迎。老实说我想各各人应该要考虑之后养动物才对。', '「宠物出租」我看在都市区会受欢迎。老实说我想各个人应该要考虑之后养动物才对。'], ['对下面中文拼写纠错:', '到了学校一后,他跟他的同学一起上数学科。', '到了学校以后,他跟他的同学一起上数学课。']]
2023-04-19 19:42:30.857 | WARNING | textgen.chatglm.chatglm_model:train_model:241 - Checkpoint ./outputs/adapter_model.bin not found
trainable params: 3670016 || all params: 6176956416 || trainable%: 0.05941463324063059
2023-04-19 19:42:30.860 | INFO | textgen.chatglm.chatglm_utils:__init__:93 - Creating features from dataset file at cache_dir/
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 2338/2338 [00:01<00:00, 1777.50it/s]
2023-04-19 19:42:32.181 | INFO | textgen.chatglm.chatglm_utils:__init__:121 - Saving features into cached file cache_dir/THUDM_chatglm-6b_cached_1282338
2023-04-19 19:42:32.184 | DEBUG | textgen.chatglm.chatglm_model:train_model:251 - train_dataset len: 2338, train_dataset[0]: [5, 64286, 12, 63836, 65845, 68088, 66642, 64339, 89435, 12, 4, 63836, 91601, 64236, 64802, 72925, 66817, 65049, 6, 64050, 63858, 63889, 64112, 65539, 6, 63858, 71808, 105293, 64436, 63889, 64112, 6, 86045, 83875, 64050, 123629, 63839, 109352, 6, 70230, 109951, 107027, 64428, 69353, 63825, 65561, 66612, 63823, 4, 67342, 12, 130001, 130004, 5, 63836, 91601, 64236, 64802, 72925, 66817, 65049, 6, 64050, 63858, 63889, 64112, 65539, 6, 63858, 64436, 105293, 64436, 63889, 64112, 6, 86045, 83875, 64050, 123629, 63839, 109352, 6, 70230, 109951, 107027, 64428, 69353, 63825, 65561, 66612, 63823, 130005]
2023-04-19 19:42:32.185 | WARNING | textgen.chatglm.chatglm_model:train_model:284 - Process rank: -1, device: cuda:0, n_gpu: 4, distributed training: False, 16-bits training: True
2023-04-19 19:42:32.186 | INFO | textgen.chatglm.chatglm_model:train_model:288 - Training/evaluation parameters TrainingArguments(
_n_gpu=4,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0002,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./outputs//logs,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=50,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=0.2,
optim=adamw_torch,
optim_args=None,
output_dir=./outputs/,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=2,
per_device_train_batch_size=2,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./outputs/,
save_on_each_node=False,
save_safetensors=False,
save_steps=400,
save_strategy=steps,
save_total_limit=3,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
2023-04-19 19:42:32.191 | INFO | textgen.chatglm.chatglm_model:train_model:302 - *** Train ***
0%| | 0/234 [00:00<?, ?it/s]/home/myuser/mambaforge/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/myuser/dl/textgen_lora_train/textgen/examples/chatglm/training_chatglm_demo.py │
│ :100 in <module> │
│ │
│ 97 │
│ 98 │
│ 99 if __name__ == '__main__': │
│ ❱ 100 │ main() │
│ 101 │
│ │
│ /home/myuser/dl/textgen_lora_train/textgen/examples/chatglm/training_chatglm_demo.py │
│ :64 in main │
│ │
│ 61 │ │ train_data = load_data(args.train_file) │
│ 62 │ │ logger.debug('train_data: {}'.format(train_data[:10])) │
│ 63 │ │ train_df = pd.DataFrame(train_data, columns=["instruction", "input", "output"]) │
│ ❱ 64 │ │ model.train_model(train_df) │
│ 65 │ if args.do_predict: │
│ 66 │ │ if model is None: │
│ 67 │ │ │ model = ChatGlmModel( │
│ │
│ /home/myuser/dl/textgen_lora_train/textgen/examples/chatglm/../../textgen/chatglm/ch │
│ atglm_model.py:303 in train_model │
│ │
│ 300 │ │ │ self.model = torch.compile(self.model) │
│ 301 │ │ │
│ 302 │ │ logger.info("*** Train ***") │
│ ❱ 303 │ │ (global_step, training_loss, metrics) = trainer.train(resume_from_checkpoint=res │
│ 304 │ │ self.handle_metrics("train", metrics, self.args.output_dir) │
│ 305 │ │ self.results.update(metrics) │
│ 306 │ │ self.save_model(model=self.model) │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/transformers/trainer.py:1662 in │
│ train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/transformers/trainer.py:1929 in │
│ _inner_training_loop │
│ │
│ 1926 │ │ │ │ │ with model.no_sync(): │
│ 1927 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1928 │ │ │ │ else: │
│ ❱ 1929 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1930 │ │ │ │ │
│ 1931 │ │ │ │ if ( │
│ 1932 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/transformers/trainer.py:2699 in │
│ training_step │
│ │
│ 2696 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2697 │ │ │
│ 2698 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2699 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2700 │ │ │
│ 2701 │ │ if self.args.n_gpu > 1: │
│ 2702 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /home/myuser/dl/textgen_lora_train/textgen/examples/chatglm/../../textgen/chatglm/ch │
│ atglm_model.py:506 in compute_loss │
│ │
│ 503 │
│ 504 class FinetuneTrainer(Trainer): │
│ 505 │ def compute_loss(self, model, inputs, return_outputs=False): │
│ ❱ 506 │ │ return model( │
│ 507 │ │ │ input_ids=inputs["input_ids"], │
│ 508 │ │ │ labels=inputs["labels"], │
│ 509 │ │ ).loss │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 │
│ in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/peft/peft_model.py:663 in │
│ forward │
│ │
│ 660 │ ): │
│ 661 │ │ peft_config = self.active_peft_config │
│ 662 │ │ if not isinstance(peft_config, PromptLearningConfig): │
│ ❱ 663 │ │ │ return self.base_model( │
│ 664 │ │ │ │ input_ids=input_ids, │
│ 665 │ │ │ │ attention_mask=attention_mask, │
│ 666 │ │ │ │ inputs_embeds=inputs_embeds, │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 │
│ in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:82 │
│ in forward │
│ │
│ 79 │ │ return getattr(self._orig_mod, name) │
│ 80 │ │
│ 81 │ def forward(self, *args, **kwargs): │
│ ❱ 82 │ │ return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs) │
│ 83 │
│ 84 │
│ 85 def remove_from_cache(f): │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:209 │
│ in _fn │
│ │
│ 206 │ │ │ dynamic_ctx = enable_dynamic(self.dynamic) │
│ 207 │ │ │ dynamic_ctx.__enter__() │
│ 208 │ │ │ try: │
│ ❱ 209 │ │ │ │ return fn(*args, **kwargs) │
│ 210 │ │ │ finally: │
│ 211 │ │ │ │ set_eval_frame(prior) │
│ 212 │ │ │ │ dynamic_ctx.__exit__(None, None, None) │
│ │
│ /home/myuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/35ca52 │
│ 301fbedee885b0838da5d15b7b47faa37c/modeling_chatglm.py:1190 in forward │
│ │
│ 1187 │ │ use_cache = use_cache if use_cache is not None else self.config.use_cache │
│ 1188 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │
│ 1189 │ │ │
│ ❱ 1190 │ │ transformer_outputs = self.transformer( │
│ 1191 │ │ │ input_ids=input_ids, │
│ 1192 │ │ │ position_ids=position_ids, │
│ 1193 │ │ │ attention_mask=attention_mask, │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 │
│ in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /home/myuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/35ca52 │
│ 301fbedee885b0838da5d15b7b47faa37c/modeling_chatglm.py:936 in forward │
│ │
│ 933 │ │ │ │ past_key_values = tuple([None] * len(self.layers)) │
│ 934 │ │ │ │
│ 935 │ │ │ if attention_mask is None: │
│ ❱ 936 │ │ │ │ attention_mask = self.get_masks( │
│ 937 │ │ │ │ │ input_ids, │
│ 938 │ │ │ │ │ device=input_ids.device │
│ 939 │ │ │ │ ) │
│ │
│ /home/myuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/35ca52 │
│ 301fbedee885b0838da5d15b7b47faa37c/modeling_chatglm.py:944 in <graph break in forward> │
│ │
│ 941 │ │ │ │
│ 942 │ │ │ if position_ids is None: │
│ 943 │ │ │ │ MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id │
│ ❱ 944 │ │ │ │ seqs = input_ids.tolist() │
│ 945 │ │ │ │ │
│ 946 │ │ │ │ mask_positions, use_gmasks = [], [] │
│ 947 │ │ │ │ for seq in seqs: │
│ │
│ /home/myuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/35ca52 │
│ 301fbedee885b0838da5d15b7b47faa37c/modeling_chatglm.py:985 in <graph break in forward> │
│ │
│ 982 │ │ │ layer_past = past_key_values[i] │
│ 983 │ │ │ │
│ 984 │ │ │ if self.gradient_checkpointing and self.training: │
│ ❱ 985 │ │ │ │ layer_ret = torch.utils.checkpoint.checkpoint( │
│ 986 │ │ │ │ │ layer, │
│ 987 │ │ │ │ │ hidden_states, │
│ 988 │ │ │ │ │ position_ids, │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/utils/checkpoint.py:249 │
│ in checkpoint │
│ │
│ 246 │ │ raise ValueError("Unexpected keyword arguments: " + ",".join(arg for arg in kwar │
│ 247 │ │
│ 248 │ if use_reentrant: │
│ ❱ 249 │ │ return CheckpointFunction.apply(function, preserve, *args) │
│ 250 │ else: │
│ 251 │ │ return _checkpoint_without_reentrant( │
│ 252 │ │ │ function, │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/autograd/function.py:506 │
│ in apply │
│ │
│ 503 │ │ if not torch._C._are_functorch_transforms_active(): │
│ 504 │ │ │ # See NOTE: [functorch vjp and autograd interaction] │
│ 505 │ │ │ args = _functorch.utils.unwrap_dead_wrappers(args) │
│ ❱ 506 │ │ │ return super().apply(*args, **kwargs) # type: ignore[misc] │
│ 507 │ │ │
│ 508 │ │ if cls.setup_context == _SingleLevelFunction.setup_context: │
│ 509 │ │ │ raise RuntimeError( │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/utils/checkpoint.py:81 in │
│ forward │
│ │
│ 78 │ │ # Accommodates the (remote) possibility that autocast is enabled for cpu AND gpu │
│ 79 │ │ ctx.gpu_autocast_kwargs, ctx.cpu_autocast_kwargs = _get_autocast_kwargs() │
│ 80 │ │ if preserve_rng_state: │
│ ❱ 81 │ │ │ ctx.fwd_cpu_state = torch.get_rng_state() │
│ 82 │ │ │ # Don't eagerly initialize the cuda context by accident. │
│ 83 │ │ │ # (If the user intends that the context is initialized later, within their │
│ 84 │ │ │ # run_function, we SHOULD actually stash the cuda state here. Unfortunately │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:337 │
│ in catch_errors │
│ │
│ 334 │ │ │ │ │ return hijacked_callback(frame, cache_size, hooks) │
│ 335 │ │ │
│ 336 │ │ with compile_lock: │
│ ❱ 337 │ │ │ return callback(frame, cache_size, hooks) │
│ 338 │ │
│ 339 │ catch_errors._torchdynamo_orig_callable = callback # type: ignore[attr-defined] │
│ 340 │ return catch_errors │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py: │
│ 404 in _convert_frame │
│ │
│ 401 │ def _convert_frame(frame: types.FrameType, cache_size: int, hooks: Hooks): │
│ 402 │ │ counters["frames"]["total"] += 1 │
│ 403 │ │ try: │
│ ❱ 404 │ │ │ result = inner_convert(frame, cache_size, hooks) │
│ 405 │ │ │ counters["frames"]["ok"] += 1 │
│ 406 │ │ │ return result │
│ 407 │ │ except (NotImplementedError, Unsupported): │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py: │
│ 104 in _fn │
│ │
│ 101 │ │ prior_fwd_from_src = torch.fx.graph_module._forward_from_src │
│ 102 │ │ torch.fx.graph_module._forward_from_src = fx_forward_from_src_skip_result │
│ 103 │ │ try: │
│ ❱ 104 │ │ │ return fn(*args, **kwargs) │
│ 105 │ │ finally: │
│ 106 │ │ │ torch._C._set_grad_enabled(prior_grad_mode) │
│ 107 │ │ │ torch.random.set_rng_state(rng_state) │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py: │
│ 262 in _convert_frame_assert │
│ │
│ 259 │ │ global initial_grad_state │
│ 260 │ │ initial_grad_state = torch.is_grad_enabled() │
│ 261 │ │ │
│ ❱ 262 │ │ return _compile( │
│ 263 │ │ │ frame.f_code, │
│ 264 │ │ │ frame.f_globals, │
│ 265 │ │ │ frame.f_locals, │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/utils.py:163 in │
│ time_wrapper │
│ │
│ 160 │ │ │ if key not in compilation_metrics: │
│ 161 │ │ │ │ compilation_metrics[key] = [] │
│ 162 │ │ │ t0 = time.time() │
│ ❱ 163 │ │ │ r = func(*args, **kwargs) │
│ 164 │ │ │ time_spent = time.time() - t0 │
│ 165 │ │ │ # print(f"Dynamo timer: key={key}, latency={latency:.2f} sec") │
│ 166 │ │ │ compilation_metrics[key].append(time_spent) │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py: │
│ 324 in _compile │
│ │
│ 321 │ try: │
│ 322 │ │ for attempt in itertools.count(): │
│ 323 │ │ │ try: │
│ ❱ 324 │ │ │ │ out_code = transform_code_object(code, transform) │
│ 325 │ │ │ │ orig_code_map[out_code] = code │
│ 326 │ │ │ │ break │
│ 327 │ │ │ except exc.RestartAnalysis: │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/bytecode_transfor │
│ mation.py:445 in transform_code_object │
│ │
│ 442 │ instructions = cleaned_instructions(code, safe) │
│ 443 │ propagate_line_nums(instructions) │
│ 444 │ │
│ ❱ 445 │ transformations(instructions, code_options) │
│ 446 │ return clean_and_assemble_instructions(instructions, keys, code_options)[1] │
│ 447 │
│ 448 │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py: │
│ 311 in transform │
│ │
│ 308 │ │ │ export, │
│ 309 │ │ │ mutated_closure_cell_contents, │
│ 310 │ │ ) │
│ ❱ 311 │ │ tracer.run() │
│ 312 │ │ output = tracer.output │
│ 313 │ │ assert output is not None │
│ 314 │ │ assert output.output_instructions │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert. │
│ py:1726 in run │
│ │
│ 1723 │ │
│ 1724 │ def run(self): │
│ 1725 │ │ _step_logger()(logging.INFO, f"torchdynamo start tracing {self.f_code.co_name}") │
│ ❱ 1726 │ │ super().run() │
│ 1727 │ │
│ 1728 │ def match_nested_cell(self, name, cell): │
│ 1729 │ │ """Match a cell in this method to one in a function we are inlining""" │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert. │
│ py:576 in run │
│ │
│ 573 │ │ │ while ( │
│ 574 │ │ │ │ self.instruction_pointer is not None │
│ 575 │ │ │ │ and not self.output.should_exit │
│ ❱ 576 │ │ │ │ and self.step() │
│ 577 │ │ │ ): │
│ 578 │ │ │ │ pass │
│ 579 │ │ except BackendCompilerFailed: │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert. │
│ py:540 in step │
│ │
│ 537 │ │ try: │
│ 538 │ │ │ if not hasattr(self, inst.opname): │
│ 539 │ │ │ │ unimplemented(f"missing: {inst.opname}") │
│ ❱ 540 │ │ │ getattr(self, inst.opname)(inst) │
│ 541 │ │ │ │
│ 542 │ │ │ return inst.opname != "RETURN_VALUE" │
│ 543 │ │ except BackendCompilerFailed: │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert. │
│ py:342 in wrapper │
│ │
│ 339 │ │ │ state = self.copy_graphstate() │
│ 340 │ │ │ reason = None │
│ 341 │ │ │ try: │
│ ❱ 342 │ │ │ │ return inner_fn(self, inst) │
│ 343 │ │ │ except Unsupported as excp: │
│ 344 │ │ │ │ if self.has_backedge() and self.should_compile_partial_graph(): │
│ 345 │ │ │ │ │ msg = "Skipping frame because there is a graph break in a for/while │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert. │
│ py:965 in CALL_FUNCTION │
│ │
│ 962 │ def CALL_FUNCTION(self, inst): │
│ 963 │ │ args = self.popn(inst.argval) │
│ 964 │ │ fn = self.pop() │
│ ❱ 965 │ │ self.call_function(fn, args, {}) │
│ 966 │ │
│ 967 │ @break_graph_if_unsupported(push=1) │
│ 968 │ def CALL_FUNCTION_EX(self, inst): │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert. │
│ py:474 in call_function │
│ │
│ 471 │ │ │ isinstance(x, VariableTracker) │
│ 472 │ │ │ for x in itertools.chain(args, kwargs.values()) │
│ 473 │ │ ) │
│ ❱ 474 │ │ self.push(fn.call_function(self, args, kwargs)) │
│ 475 │ │
│ 476 │ def update_locals_and_stack(self, oldvar: VariableTracker, newvar: VariableTracker): │
│ 477 │ │ def repl(v: VariableTracker): │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/variables/torch.p │
│ y:368 in call_function │
│ │
│ 365 │ │ │ def get_state_from_generator(): │
│ 366 │ │ │ │ return self.value() │
│ 367 │ │ │ │
│ ❱ 368 │ │ │ return wrap_fx_proxy( │
│ 369 │ │ │ │ tx=tx, │
│ 370 │ │ │ │ proxy=tx.output.create_proxy( │
│ 371 │ │ │ │ │ "call_function", │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/variables/builder │
│ .py:754 in wrap_fx_proxy │
│ │
│ 751 │
│ 752 │
│ 753 def wrap_fx_proxy(tx, proxy, example_value=None, **options): │
│ ❱ 754 │ return wrap_fx_proxy_cls( │
│ 755 │ │ target_cls=TensorVariable, │
│ 756 │ │ tx=tx, │
│ 757 │ │ proxy=proxy, │
│ │
│ /home/myuser/mambaforge/lib/python3.10/site-packages/torch/_dynamo/variables/builder │
│ .py:812 in wrap_fx_proxy_cls │
│ │
│ 809 │ │ │ │ "ignore_subclass": ignore_subclass, │
│ 810 │ │ │ │ "is_tensor": target_cls is TensorVariable, │
│ 811 │ │ │ } │
│ ❱ 812 │ │ │ assert "source" in options and options["source"] is not None │
│ 813 │ │ │ kwargs["source"] = options["source"] │
│ 814 │ │ │ example_value = wrap_to_fake_tensor_and_record( │
│ 815 │ │ │ │ example_value, tx=tx, **kwargs │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError:
from user code:
File "/home/myuser/mambaforge/lib/python3.10/site-packages/torch/random.py", line 23, in get_rng_state
return default_generator.get_state()
Set torch._dynamo.config.verbose=True for more information
You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True
0%| | 0/234 [00:06<?, ?it/s]
更新chatglm-6b的文件。
更新chatglm-6b的文件。
你的torch版本是2.0还是1.13.1? 我用的torch 2.0 之前的错误应该和 https://github.com/pytorch/pytorch/issues/97077 这个有关,但是勉强绕过去之后又碰到了新的错误
更新代码
更新代码
总算可以了,不过继续训练时仍然有报错,看上去是peft加载训练后的模型时报错,是现在的模式无法继续训练吗?
textgen/examples/chatglm$ python training_chatglm_demo.py --do_train
2023-04-21 11:15:08.224 | INFO | textgen.chatglm.chatglm_model:train_model:235 - Restarting from ./outputs/adapter_model.bin
...
│ /DaTa/dl/textgen_lora_train/textgen/examples/chatglm/../../textgen/chatglm/chatglm_model.py:241 in train_model
if os.path.exists(checkpoint_name):
logger.info(f"Restarting from {checkpoint_name}")
adapters_weights = torch.load(checkpoint_name)
self.model = set_peft_model_state_dict(self.model, adapters_weights)
│ 238 │ │ │ │ else: │
│ 239 │ │ │ │ │ logger.warning(f"Checkpoint {checkpoint_name} not found") │
│ 240 │ │ │ │
│ ❱ 241 │ │ │ self.model.print_trainable_parameters() # Be more transparent about the % o │
│ 242 │ │ else: │
│ 243 │ │ │ logger.warning("Now full model params fine-tune, which is slow, set `use_lor │
│ 244 │ │ os.makedirs(output_dir, exist_ok=True) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'NoneType' object has no attribute 'print_trainable_parameters'
当前模型未测试lora继续训练的情况。 列个todo。
更新代码
总算可以了,不过继续训练时仍然有报错,看上去是peft加载训练后的模型时报错,是现在的模式无法继续训练吗?
textgen/examples/chatglm$ python training_chatglm_demo.py --do_train 2023-04-21 11:15:08.224 | INFO | textgen.chatglm.chatglm_model:train_model:235 - Restarting from ./outputs/adapter_model.bin ... │ /DaTa/dl/textgen_lora_train/textgen/examples/chatglm/../../textgen/chatglm/chatglm_model.py:241 in train_model if os.path.exists(checkpoint_name): logger.info(f"Restarting from {checkpoint_name}") adapters_weights = torch.load(checkpoint_name) self.model = set_peft_model_state_dict(self.model, adapters_weights) │ 238 │ │ │ │ else: │ │ 239 │ │ │ │ │ logger.warning(f"Checkpoint {checkpoint_name} not found") │ │ 240 │ │ │ │ │ ❱ 241 │ │ │ self.model.print_trainable_parameters() # Be more transparent about the % o │ │ 242 │ │ else: │ │ 243 │ │ │ logger.warning("Now full model params fine-tune, which is slow, set `use_lor │ │ 244 │ │ os.makedirs(output_dir, exist_ok=True) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: 'NoneType' object has no attribute 'print_trainable_parameters'
fixed. https://github.com/shibing624/textgen/commit/633e3761f369ecbc4daa89ddad60281a7f8614ca
我自己训练完了,对于 "少先队员因该为老人让坐" 的输出是正确的了,但是看着最后的loss和train_result我还是有点迷茫,似乎Loss很早就不收敛了,会不会中间某些步数的结果效果会更好? trian_result:
epoch = 1.0
train_loss = 0.14049111964047134
train_runtime = 35059.424
train_samples_per_second = 7.183
train_steps_per_second = 3.592
中间及部分最后的输出:
{'loss': 0.0864, 'learning_rate': 5.5363014025000404e-05, 'epoch': 0.72}
...
{'loss': 0.078, 'learning_rate': 4.440667736145746e-05, 'epoch': 0.78}
...
{'loss': 0.1016, 'learning_rate': 3.511809272701282e-05, 'epoch': 0.82}
{'loss': 0.0743, 'learning_rate': 3.503867596372242e-05, 'epoch': 0.83}
{'loss': 0.0851, 'learning_rate': 3.495925920043203e-05, 'epoch': 0.83}
{'loss': 0.1033, 'learning_rate': 3.4879842437141635e-05, 'epoch': 0.83}
...
{'loss': 0.1196, 'learning_rate': 1.4237837322702078e-05, 'epoch': 0.93}
{'loss': 0.0837, 'learning_rate': 1.4158420559411681e-05, 'epoch': 0.93}
{'loss': 0.0762, 'learning_rate': 1.4079003796121287e-05, 'epoch': 0.93}
{'loss': 0.1166, 'learning_rate': 1.399958703283089e-05, 'epoch': 0.93}
...
{'loss': 0.0731, 'learning_rate': 1.1618672469384839e-05, 'epoch': 0.94}
...
{'loss': 0.0708, 'learning_rate': 2.4095045982305945e-06, 'epoch': 0.99}
...
{'loss': 0.1309, 'learning_rate': 3.4784542321193156e-07, 'epoch': 1.0}
{'loss': 0.0888, 'learning_rate': 2.6842865992153625e-07, 'epoch': 1.0}
{'loss': 0.0981, 'learning_rate': 1.8901189663114092e-07, 'epoch': 1.0}
{'loss': 0.0906, 'learning_rate': 1.0959513334074557e-07, 'epoch': 1.0}
{'train_runtime': 35059.424, 'train_samples_per_second': 7.183, 'train_steps_per_second': 3.592, 'train_loss': 0.14049111964047134, 'epoch': 1.0}
100%| | 125918/125918 [9:44:19<00:00, 3.59it/s]
2023-04-21 21:12:20.531 | INFO | textgen.chatglm.chatglm_model:handle_metrics:327 - ***** train metrics *****
2023-04-21 21:12:20.532 | INFO | textgen.chatglm.chatglm_model:handle_metrics:329 - epoch = 1.0
2023-04-21 21:12:20.532 | INFO | textgen.chatglm.chatglm_model:handle_metrics:329 - train_loss = 0.14049111964047134
2023-04-21 21:12:20.532 | INFO | textgen.chatglm.chatglm_model:handle_metrics:329 - train_runtime = 35059.424
2023-04-21 21:12:20.532 | INFO | textgen.chatglm.chatglm_model:handle_metrics:329 - train_samples_per_second = 7.183
2023-04-21 21:12:20.532 | INFO | textgen.chatglm.chatglm_model:handle_metrics:329 - train_steps_per_second = 3.592
2023-04-21 21:12:20.563 | DEBUG | textgen.chatglm.chatglm_model:train_model:308 - metrics: {'train_runtime': 35059.424, 'train_samples_per_second': 7.183, 'train_steps_per_second': 3.592, 'train_loss': 0.14049111964047134, 'epoch': 1.0}
2023-04-21 21:12:20.563 | INFO | textgen.chatglm.chatglm_model:train_model:309 - Training of /DaTa/textgen_lora_train/chatglm-6b model complete. Saved to ./outputs-csc/.
具体模型效果哪个最好,依赖具体任务评估结果,train loss 最低也并不代表效果就最好。1)可以抽case看各checkpoint效果;2)可以算rouge,bleu看各checkpoint效果;3)csc任务可以看测试集的F1值。
具体模型效果哪个最好,依赖具体任务评估结果,train loss 最低也并不代表效果就最好。1)可以抽case看各checkpoint效果;2)可以算rouge,bleu看各checkpoint效果;3)csc任务可以看测试集的F1值。
csc任务的测试集似乎也不是全对?比如下面这个(我改了pycorrector/utils/eval.py试着运行一下的结果)
input : 后来客人非常地生气,然后叫我过来。
truth : 后来客人非常地生气,然后叫我过来。
predict: 后来客人非常地生气,然后叫我过去。 错误字:来
wrong
input : 总而言之,正规教育是需要的,但是必要的是学者学习的过程与现在,如何减化不愉快的课程、如何解放学习的压力,这不是学该单方摸索,而是需要适 当的辅导老师。
truth : 总而言之,正规教育是需要的,但是必要的是学者学习的过程与现在,如何减化不愉快的课程、如何解放学习的压力,这不是学该单方摸索,而是需要适 当的辅导老师。
predict: 总而言之,正规教育是需要的,但是必要的是学者学习的过程与现在,如何减化不愉快的课程、如何解放学习的压力,这不是学生单方摸索,而是需要适 当的辅导老师。 错误字:该
wrong
还有一些像是台湾用语和大陆用语的差异
input : 可是从妈妈给我十岁生日的礼物一个口琴那时候我就发现我是艺术者。
truth : 可是从妈妈给我十岁生日的礼物一个口琴那时候我就发现我是艺术者。
predict: 可是从妈妈给我十岁生日的礼物一个口琴那时候我就发现我是艺术家。 错误字:者
wrong
是,SIGHAN数据集质量不够高。
是,SIGHAN数据集质量不够高。
我去手工修订一次,才1000条,按说还是可以搞的。 另外,temperature或者top_p多少设置会更合适?
用缺省的设置跑了一次这句话,十次有6次正确。
['这个人很利害。', '错误字:']
['这个人很利害。', '错误字:']
['这个人很厉害。', '错误字:利']
['这个人很利害。', '错误字:']
['这个人很危险。', '错误字:利']
['这个人很厉害。', '错误字:利']
['这个人很厉害。', '错误字:利']
['这个人很厉害。', '错误字:利']
['这个人很厉害。', '错误字:利']
['这个人很利害。', '错误字:']
建议去调整下 top_p, num_beams, repetition_renalty, temperature, do_sample=True;
数据生成有重复,调高repetition_renalty;
csc这种任务不复杂,训练样本少,调低 temperature。
以上是经验参数,具体调参根据任务而定,不是固定的。
数据生成有重复,调高repetition_renalty;
这个不是重复,是我重复调用chat了十次,发现结果不稳定。
您说的是训练时调整这些参数吗?
运行
chatglm$ python predict_demo.py
报下面错误
RuntimeError: self and mat2 must have the same dtype
fixed. 633e376
我碰到了一个比较奇异的现象,就是training_chatglm_csc_demo.py 这个模式下继续训练似乎无效,而且还会产生忘记训练效果。 即使我在文件中增加了 "resume_from_checkpoint": args.output_dir, 这行
diff --git a/examples/chatglm/training_chatglm_csc_demo.py b/examples/chatglm/training_chatglm_csc_demo.py
index 84e066b..d291130 100644
--- a/examples/chatglm/training_chatglm_csc_demo.py
+++ b/examples/chatglm/training_chatglm_csc_demo.py
@@ -81,6 +81,7 @@ def main():
"per_device_train_batch_size": args.batch_size,
"num_train_epochs": args.num_epochs,
"output_dir": args.output_dir,
+ "resume_from_checkpoint": args.output_dir,
}
model = ChatGlmModel(args.model_type, args.model_name, args=model_args)
我试过训练了1个epoch而且效果正常的outputs-csc,然后即使继续训练0.01个epoch,马上也会崩掉。 因为这个训练出来的回答是标准格式如"错误字: 因”,所以特别明显。但是看log,却是load了之前的lora
2023-05-09 13:28:16.469 | INFO | textgen.chatglm.chatglm_model:load_lora:354 - Loaded lora model from outputs-csc/adapter_model.bin
2023-05-09 13:28:16.557 | INFO | textgen.chatglm.chatglm_model:train_model:237 - Restarting from outputs-csc/adapter_model.bin
我还用 training_chatglm_hfdataset_demo.py 训练了其它数据,继续训练了7次(每次1 epoch),似乎结果是正常的。
更新代码
更新代码
更新代码后,只有 ‘training_chatglm_adgen_demo.py’ 和 ‘training_chatglm_demo.py’ 有"resume_from_checkpoint" 参数了? 是现在不需要这个参数吗?
但是无论是否加上这个参数,在一个已经能正确输出标准格式 "错误字: 因” 的模型下继续训练0.1甚至0.01 epoch,都会导致原来训练出的能力丢失。 是否加上这个参数的区别仅仅在于会不会有下面的第三行
2023-05-12 01:59:49.638 | INFO | textgen.chatglm.chatglm_model:load_peft_model:439 - Loaded peft model from output-csc/adapter_model.bin
2023-05-12 01:59:49.640 | INFO | textgen.chatglm.chatglm_model:train_model:229 - Using PEFT type: LORA
2023-05-12 01:59:49.690 | INFO | textgen.chatglm.chatglm_model:train_model:310 - Restarting from output-csc/adapter_model.bin
第二条“基于已经训练好的lora继续训练的推荐做法,是把原来的lora权重merge到base model,再基于新base model训练;”意思是说得到merge以后,然后通过--model_name参数指定merge ,实现继续训练是这样吗
alpaca-zh等通用数据集? 这是chatglm之前预训练过的语料吗?怎么知道模型训练过哪些语料?
以为要保留原来的对话等通用能力,是加入chatglm训练时用过的语料 -- 所以...,其实类似alpaca-zh等通用数据集,也是可以避免遗忘问题
第二条“基于已经训练好的lora继续训练的推荐做法,是把原来的lora权重merge到base model,再基于新base model训练;”意思是说得到merge以后,然后通过--model_name参数指定merge ,实现继续训练是这样吗
你的问题解决了?我用--mode_name参数指定merge的目录,继续训练会出现OSError: /opt/models/THUDM_chatglm-6b-lora/ does not appear to have a file named configuration_chatglm.py. 这个错误的。
configuration_chatglm.py等原chatglm官方目录下的py文件手动拷贝过来
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.(由于长期不活动,机器人自动关闭此问题,如果需要欢迎提问)
运行 /ChatGLM-6B/textgen/examples/chatglm$ python predict_demo.py 报错,glm6B模型用的是 原版, lora 微调模型 用的是 git clone https://huggingface.co/shibing624/chatglm-6b-csc-zh-lora
报错
(pt) ubuntu@youran-gpu21:~/ChatGLM-6B/textgen/examples/chatglm$ python predict_demo2.py 2023-04-14 11:47:33.176 | DEBUG | textgen.chatglm.chatglm_model:init:98 - Device: cuda Explicitly passing a
r = model.predict(["对下面中文拼写纠错:\n少先队员因该为老人让坐。\n答:"])
File "/home/ubuntu/anaconda3/envs/pt/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/pt/lib/python3.9/site-packages/textgen-0.1.9-py3.9.egg/textgen/chatglm/chatglm_model.py", line 385, in predict
self.model.eval()
File "/home/ubuntu/anaconda3/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1930, in eval
return self.train(False)
File "/home/ubuntu/anaconda3/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1911, in train
module.train(mode)
File "/home/ubuntu/anaconda3/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1911, in train
module.train(mode)
File "/home/ubuntu/anaconda3/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1911, in train
module.train(mode)
[Previous line repeated 4 more times]
File "/home/ubuntu/anaconda3/envs/pt/lib/python3.9/site-packages/peft-0.2.0-py3.9.egg/peft/tuners/lora.py", line 417, in train
delta_w = F.conv1d(
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [8192, 8, 1, 1], but got 3-dimensional input of size [1, 16, 4096] instead
revision
is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:12<00:00, 1.58s/it] Explicitly passing arevision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. 2023-04-14 11:48:08.995 | INFO | textgen.chatglm.chatglm_model:load_lora:342 - Loaded lora model from /home/ubuntu/ChatGLM-6B/textgen/chatglm-6b-csc-zh-lora Traceback (most recent call last): File "/home/ubuntu/ChatGLM-6B/textgen/examples/chatglm/predict_demo2.py", line 12, in