DeepKE/example/ee/standard 预测错误

ping40 commented 1 year ago

按照文档 https://github.com/zjunlp/DeepKE/tree/main/example/ee/standard#readme，先执行 python run.py (用 DuEE数据）

然后执行python predict.py ，出现如下错误：

(deepke) [ping55@localhost standard]$ python predict.py
[2023-06-27 20:11:40,481][__main__][INFO] - {'n_gpu: ': 1}
[2023-06-27 20:11:40,482][__main__][WARNING] - Process rank: -1, device: cuda:0, n_gpu: 1, distributed training: False, 16-bits training: False
build CRF...
[2023-06-27 20:11:46,699][__main__][INFO] - Training/evaluation parameters {'data_name': 'DuEE', 'model_name_or_path': '/home/ping55/code/DeepKE-main/example/ee/standard/./exp/DuEE/trigger/bert-base-uncased', 'task_name': 'trigger', 'do_train': False, 'do_eval': True, 'do_predict': False, 'do_pipeline_predict': True, 'overwrite_cache': True, 'dev_trigger_pred_file': '/home/ping55/code/DeepKE-main/example/ee/standard/./exp/DuEE/trigger/bert-base-uncased/eval_pred.json', 'test_trigger_pred_file': '/home/ping55/code/DeepKE-main/example/ee/standard/./exp/DuEE/trigger/bert-base-uncased/test_pred.json', 'model_type': 'bertcrf', 'labels': '', 'config_name': '', 'tokenizer_name': '', 'cache_dir': '', 'evaluate_during_training': True, 'do_lower_case': True, 'weight_decay': 0.0, 'learning_rate': 5e-05, 'adam_epsilon': 1e-08, 'per_gpu_train_batch_size': 16, 'per_gpu_eval_batch_size': 16, 'gradient_accumulation_steps': 1, 'max_seq_length': 256, 'max_grad_norm': 1.0, 'num_train_epochs': 10, 'max_steps': 5000, 'warmup_steps': 0, 'logging_steps': 500, 'save_steps': 500, 'eval_all_checkpoints': False, 'no_cuda': False, 'n_gpu': 1, 'overwrite_output_dir': True, 'seed': 42, 'fp16': False, 'fp16_opt_level': '01', 'local_rank': -1, 'data_dir': '/home/ping55/code/DeepKE-main/example/ee/standard/./data/DuEE/trigger', 'tag_path': '/home/ping55/code/DeepKE-main/example/ee/standard/./data/DuEE/schema', 'output_dir': '', 'cwd': '/home/ping55/code/DeepKE-main/example/ee/standard'}
1498it [00:10, 139.89it/s]
[2023-06-27 20:11:57,429][deepke.event_extraction.standard.bertcrf.processor_ee][INFO] - LOOKING AT /home/ping55/code/DeepKE-main/example/ee/standard/./data/DuEE/trigger/dev_with_pred_trigger.tsv train
[2023-06-27 20:11:57,468][run][INFO] - Creating features from dataset file at /home/ping55/code/DeepKE-main/example/ee/standard/./data/DuEE/trigger
###############
[2023-06-27 20:11:57,470][deepke.event_extraction.standard.bertcrf.processor_ee][INFO] - Writing example 0 of 2233
###############
Traceback (most recent call last):
  File "predict.py", line 140, in <module>
    main()
  File "/home/ping55/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/home/ping55/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
    run_and_report(
  File "/home/ping55/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/home/ping55/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/home/ping55/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in <lambda>
    lambda: hydra.run(
  File "/home/ping55/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
    return run_job(
  File "/home/ping55/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/core/utils.py", line 127, in run_job
    ret.return_value = task_function(task_cfg)
  File "predict.py", line 99, in main
    eval_dataset = load_and_cache_examples(args, eval_examples , tokenizer, labels, pad_token_label_id, mode="dev")
  File "/home/ping55/code/DeepKE-main/example/ee/standard/run.py", line 275, in load_and_cache_examples
    features = convert_examples_to_features(
  File "/home/ping55/anaconda3/envs/deepke/lib/python3.8/site-packages/deepke/event_extraction/standard/bertcrf/processor_ee.py", line 337, in convert_examples_to_features
    label_ids.extend([label_map[label]] + [pad_token_label_id] * (len([word]) - 1))
KeyError: 'B-时间'
(deepke) [ping55@localhost standard]$

shengyumao commented 1 year ago

您好，从您之前的issue里头看到是安装了hydra-core==1.0.6的版本，需要重新 pip install hydra-core==1.3.1的版本（忽视与deepke版本的冲突），从您的报错里头来看可能是由于hydra的版本导致predict.yaml文件里头的配置没有覆盖。另外从报错里来看是您在使用训练好的模型重新复现在DuEE数据集上TriggerClassification的结果，如果是这样的话，建议您在./conf/predict.yaml中将模型路径改为训练好的模型，并且把do_pipeline_train设置为False。实际上python run.py的时候已经将触发词的结果预测完毕了并保存到eval_pred.json文件了。如果您是想要做Event Arguments Extraction的话，需要将train.yaml文件中的task_name设置为role并再训练一个模型（pipeline的事件抽取需要两个模型来完成），接着再运行predict.py来得到最终pipeline抽取的结果。我们现在更新了一下readme，您可以重新跑一下。感谢您的使用反馈

ping40 commented 1 year ago

hydra-core==1.0.6的版本和 do_pipeline_train = False情况下，程序的确可以的确可行。

ping40 commented 1 year ago

预测结果为：

f1 = 0.415527950310559
loss = 164.49403154089094
precision = 0.46783216783216786
recall = 0.37374301675977656

这个结果也太差了，没法用啊。是我哪里设置不正确吗？

期望的结果如附件：附件来自DuEE论文中写的。

zxlzr commented 1 year ago

您好，我们会尽快排查下，也建议您测试下训练集效果看看模型是否欠拟合，建议您多训几个epoch再测试下。

shengyumao commented 1 year ago

您好，您可以尝试将train.yaml中的max_steps调大一些比如10000或者更多，应该能够得到更好的效果。另外可能是由于底座模型的中文能力不够强，使用bert-base并不能复现到DuEE论文中基线的结果，您可以尝试使用别的中文能力更强的底座模型。

zxlzr commented 1 year ago

您好建议您使用RoBERTa-wwm-ext-large试一试https://github.com/ymcui/Chinese-BERT-wwm

ping40 commented 1 year ago

您好建议您使用RoBERTa-wwm-ext-large试一试https://github.com/ymcui/Chinese-BERT-wwm

源码中有：

MODEL_CLASSES = {
    "bertcrf": (BertConfig, BertCRFForTokenClassification, BertTokenizer),
}

是不是目前仅仅支持 bertcrf 模型？

或者如果我要用模型 Chinese-BERT-wwm, 我仅仅修改train.yaml中的 model_name_or_path 就可以了？

ping40 commented 1 year ago

``> 您好，您可以尝试将train.yaml中的max_steps调大一些比如10000或者更多，应该能够得到更好的效果。另外可能是由于底座模型的中文能力不够强，使用bert-base并不能复现到DuEE论文中基线的结果，您可以尝试使用别的中文能力更强的底座模型。

修改了 max_steps为 15000，没有看出好转。也没有看出有增加epoch的需要。

(base) [huangping@localhost 09-57-18]$ grep "f1" run.log
[2023-07-06 10:02:20,219][__main__][INFO] -   f1 = 0.17694598043385795
[2023-07-06 10:07:31,595][__main__][INFO] -   f1 = 0.30281962147547314
[2023-07-06 10:12:29,239][__main__][INFO] -   f1 = 0.3363669457590098
[2023-07-06 10:17:30,167][__main__][INFO] -   f1 = 0.3519400953029272
[2023-07-06 10:22:26,760][__main__][INFO] -   f1 = 0.3935611038107753
[2023-07-06 10:27:17,421][__main__][INFO] -   f1 = 0.4049697783747481
[2023-07-06 10:32:39,500][__main__][INFO] -   f1 = 0.4117272147864882
[2023-07-06 10:38:04,259][__main__][INFO] -   f1 = 0.3930456380006209
[2023-07-06 10:43:29,647][__main__][INFO] -   f1 = 0.40867992766726946
[2023-07-06 10:48:49,888][__main__][INFO] -   f1 = 0.4135224871717476
[2023-07-06 10:54:12,628][__main__][INFO] -   f1 = 0.40747126436781606
[2023-07-06 10:59:31,818][__main__][INFO] -   f1 = 0.40898617511520746
[2023-07-06 11:04:50,080][__main__][INFO] -   f1 = 0.4193738924985233
[2023-07-06 11:10:08,911][__main__][INFO] -   f1 = 0.40647586007516623
[2023-07-06 11:15:22,817][__main__][INFO] -   f1 = 0.41581342434584756
[2023-07-06 11:20:41,726][__main__][INFO] -   f1 = 0.40134529147982057
[2023-07-06 11:25:56,539][__main__][INFO] -   f1 = 0.4016783216783217
[2023-07-06 11:31:18,098][__main__][INFO] -   f1 = 0.41398919533693485
[2023-07-06 11:36:30,602][__main__][INFO] -   f1 = 0.4116978989210676
[2023-07-06 11:41:25,092][__main__][INFO] -   f1 = 0.4120603015075377
[2023-07-06 11:46:18,326][__main__][INFO] -   f1 = 0.4113871057772816
[2023-07-06 11:51:09,629][__main__][INFO] -   f1 = 0.41914471821013877
[2023-07-06 11:56:01,582][__main__][INFO] -   f1 = 0.4054945054945055
[2023-07-06 12:00:53,145][__main__][INFO] -   f1 = 0.41611479028697573
[2023-07-06 12:05:42,285][__main__][INFO] -   f1 = 0.4188451594369434
[2023-07-06 12:10:32,369][__main__][INFO] -   f1 = 0.41594647337608026
[2023-07-06 12:15:24,148][__main__][INFO] -   f1 = 0.41514726507713884
[2023-07-06 12:20:15,418][__main__][INFO] -   f1 = 0.41143808995002773
[2023-07-06 12:25:05,207][__main__][INFO] -   f1 = 0.41839596186203026
[2023-07-06 12:29:57,443][__main__][INFO] -   f1 = 0.4174838890445503

附件是 outpus 日志。 run.log

zxlzr commented 1 year ago

您好，事件抽取任务对于小模型来说是非常难的，那个DuEE里论文汇报的结果是比赛的最优结果，其没有介绍具体细节，DeepKE提供了基础的事件抽取功能。根据经验BERT-base效果是比较一般的，建议您尝试large模型如RoBERTa-wwm-ext-large效果会有一定程度的提升。

shengyumao commented 1 year ago

``> 您好，您可以尝试将train.yaml中的max_steps调大一些比如10000或者更多，应该能够得到更好的效果。另外可能是由于底座模型的中文能力不够强，使用bert-base并不能复现到DuEE论文中基线的结果，您可以尝试使用别的中文能力更强的底座模型。

修改了 max_steps为 15000，没有看出好转。也没有看出有增加epoch的需要。

(base) [huangping@localhost 09-57-18]$ grep "f1" run.log
[2023-07-06 10:02:20,219][__main__][INFO] -   f1 = 0.17694598043385795
[2023-07-06 10:07:31,595][__main__][INFO] -   f1 = 0.30281962147547314
[2023-07-06 10:12:29,239][__main__][INFO] -   f1 = 0.3363669457590098
[2023-07-06 10:17:30,167][__main__][INFO] -   f1 = 0.3519400953029272
[2023-07-06 10:22:26,760][__main__][INFO] -   f1 = 0.3935611038107753
[2023-07-06 10:27:17,421][__main__][INFO] -   f1 = 0.4049697783747481
[2023-07-06 10:32:39,500][__main__][INFO] -   f1 = 0.4117272147864882
[2023-07-06 10:38:04,259][__main__][INFO] -   f1 = 0.3930456380006209
[2023-07-06 10:43:29,647][__main__][INFO] -   f1 = 0.40867992766726946
[2023-07-06 10:48:49,888][__main__][INFO] -   f1 = 0.4135224871717476
[2023-07-06 10:54:12,628][__main__][INFO] -   f1 = 0.40747126436781606
[2023-07-06 10:59:31,818][__main__][INFO] -   f1 = 0.40898617511520746
[2023-07-06 11:04:50,080][__main__][INFO] -   f1 = 0.4193738924985233
[2023-07-06 11:10:08,911][__main__][INFO] -   f1 = 0.40647586007516623
[2023-07-06 11:15:22,817][__main__][INFO] -   f1 = 0.41581342434584756
[2023-07-06 11:20:41,726][__main__][INFO] -   f1 = 0.40134529147982057
[2023-07-06 11:25:56,539][__main__][INFO] -   f1 = 0.4016783216783217
[2023-07-06 11:31:18,098][__main__][INFO] -   f1 = 0.41398919533693485
[2023-07-06 11:36:30,602][__main__][INFO] -   f1 = 0.4116978989210676
[2023-07-06 11:41:25,092][__main__][INFO] -   f1 = 0.4120603015075377
[2023-07-06 11:46:18,326][__main__][INFO] -   f1 = 0.4113871057772816
[2023-07-06 11:51:09,629][__main__][INFO] -   f1 = 0.41914471821013877
[2023-07-06 11:56:01,582][__main__][INFO] -   f1 = 0.4054945054945055
[2023-07-06 12:00:53,145][__main__][INFO] -   f1 = 0.41611479028697573
[2023-07-06 12:05:42,285][__main__][INFO] -   f1 = 0.4188451594369434
[2023-07-06 12:10:32,369][__main__][INFO] -   f1 = 0.41594647337608026
[2023-07-06 12:15:24,148][__main__][INFO] -   f1 = 0.41514726507713884
[2023-07-06 12:20:15,418][__main__][INFO] -   f1 = 0.41143808995002773
[2023-07-06 12:25:05,207][__main__][INFO] -   f1 = 0.41839596186203026
[2023-07-06 12:29:57,443][__main__][INFO] -   f1 = 0.4174838890445503

附件是 outpus 日志。 run.log

您好请问你使用的是bert-base-chinese吗？因为我注意到您一开始的报错文件里模型路径是bert-base-uncased。我这边本地使用bert-base-chinese跑是没问题的，当使用gold trigger训练role模型时f1可以达到71，最终预测的结果f1是65+。另外您还可以把train.yaml中的model_name_or_path改成chinese-bert-wwm中的其他bert系列的中文预训练模型来进行尝试，例如hfl/chinese-bert-wwm-ext或者是hfl/chinese-roberta-wwm-ext-large等。

ping40 commented 1 year ago

@shengyumao 请问，模型训练出来了，最后如何使用？或者如何修改 predict.py 文件？达到如下输入输出的目的，

输入：消失的“外企光环”，5月份在华裁员900余人，香饽饽变“臭”了

输出类似如下： [ { "event_type": "组织关系-裁员", "trigger": "裁员", "trigger_start_index": 15, "arguments": [ { "argument_start_index": 17, "role": "裁员人数", "argument": "900余人", "alias": [] }, { "argument_start_index": 10, "role": "时间", "argument": "5月份", "alias": [] } ], "class": "组织关系" } ]

ping40 commented 1 year ago

@shengyumao

采用模型 chinese-bert-www-ext 后，结果如下，

  f1 = 0.8473282442748091
  loss = 94.22317569813829
  precision = 0.8274760383386581
  recall = 0.8681564245810056

chinese-roberta-wwm-ext-large 模型结果如下：

 f1 = 0.8572993700356066
 loss = 109.40171002327128
 precision = 0.8409457281031704
 recall = 0.8743016759776536

这个结果比较理想的。

ping40 commented 1 year ago

@zxlzr 请问一下，训练后的模型最后如何使用？或者有类似的例子吗？谢谢您

shengyumao commented 1 year ago

@ping40 您好我们现在暂时还没支持在自己的数据上一键完成抽取，我们会在近期进一步完善开发。如果您想用训练好的模型在自己的数据上做预测（事件类型与DuEE中的一致），需要做以下几步：

将自己的数据构建成data/DuEE/trigger/dev.tsv中的格式，并将其中的label列都设置为O(例如：“消失的“外企光环”，5月份在华裁员900余人，香饽饽变“臭”了”对应的label为“OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO”，每个字符用“\002”进行间隔)，替换掉dev.tsv；另外再生成一个duee_dev.json文件来替换掉data/DuEE/duee_dev.json，其中的event_list设置为空列表。
在predict.yaml中将model_name_or_path修改为trigger模型路径，task_name设置为trigger，do_pipeline_predict设置为False，运行predict.py，得到trigger预测的结果，保存在exp/DuEE/trigger/your_model/eval_pred.json。
在predict.yaml中将model_name_or_path修改为role模型路径，task_name设置为role，do_pipeline_predict设置为True，运行predict.py，得到role预测的结果，保存在exp/DuEE/role/your_model/eval_pred.json。
通过role预测的结果exp/DuEE/role/your_model/eval_pred.json，以及data/DuEE/role/dev_with_pred_trigger.tsv文件，您可以处理得到最终的抽取结果。其中dev_with_pred_trigger.tsv中有四列，第一列为要抽取的样本，第三列为trigger的预测结果，第四列为样本的id，每一行与role预测结果中eval_pred.json的每一行相对应。

最后您可以将相应的抽取结果转换为您需要的格式。我们也会在近期尽快支持用训练好的模型在本地数据上完成抽取。

zjunlp / DeepKE

DeepKE/example/ee/standard 预测错误 #293