YZW-explorer commented 3 years ago

作者您好，我昨天尝试跑了一下训练，没有进行任何改变最后的准确率为0.56，通过查看您的GitHub网页，我发现没有设置数据增广，在设置数据增广后，在evaluate阶段报错，错误如下： Epoch 1/2, Loss 0.1747398: 100%|████████████| 1865/1865 [21:49<00:00, 1.42it/s] 5964 1020（这两个数字是我打印出来的值分别对应len(predict_result) 和 len(real_label_list)） Traceback (most recent call last): File "train.py", line 60, in trainer.train(MODEL_DIR, 1) File "cail2019-master/model.py", line 573, in train acc, loss = self.evaluate(model, test_data, test_label_list) File "cail2019-master/model.py", line 648, in evaluate assert len(predict_result) == len(real_label_list) AssertionError

padeoe commented 3 years ago

默认就是带增广的，你的修改会导致测试集也被增广从而bug。至于准确率问题，我测试过准确率第一轮就能达到0.64了。

$ python train.py 
2021-06-21 16:02:14 - train model - INFO - 算法:BertForSimMatchModel
2021-06-21 16:02:15 - train model - INFO - ***** Running training *****
2021-06-21 16:02:15 - train model - INFO - dataset: data/train/input.txt
2021-06-21 16:02:15 - train model - INFO - k-fold number: 1
2021-06-21 16:02:15 - train model - INFO - device: cuda n_gpu: 2
2021-06-21 16:02:15 - train model - INFO - config: {
    "batch_size": 12,
    "epochs": 2,
    "fp16": false,
    "fp16_opt_level": "O1",
    "learning_rate": 2e-05,
    "max_grad_norm": 1.0,
    "max_length": 512,
    "warmup_steps": 0.1
}
2021-06-21 16:02:23 - train model - INFO - ***** fold 1/1 *****
2021-06-21 16:02:23 - train model - INFO -   Num examples = 22372
2021-06-21 16:02:23 - train model - INFO -   Batch size = 12
2021-06-21 16:02:23 - train model - INFO -   Num steps = 3728
  0%|                                                                                                                                                                                                                                                         | 0/1864 [00:00<?, ?it/s]/home/padeoe/.conda/envs/cail2019/lib/python3.6/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
  warnings.warn('Was asked to gather along dimension 0, but all '
Epoch 1/2, Loss 0.1356069: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1864/1864 [23:56<00:00,  1.29it/s]
2021-06-21 16:26:52 - train model - INFO - Epoch 1, train Loss: 753.5464908, eval acc: 0.6411764705882353, eval loss: 128.8568891
Epoch 2/2, Loss 0.0062600: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1864/1864 [23:54<00:00,  1.31it/s]
2021-06-21 16:51:21 - train model - INFO - Epoch 2, train Loss: 430.4074954, eval acc: 0.6735294117647059, eval loss: 153.7522663
2021-06-21 16:51:23 - train model - INFO - ***** Stats *****
2021-06-21 16:51:23 - train model - INFO - acc for each epoch:
2021-06-21 16:51:23 - train model - INFO - epoch 1, mean: 0.64118, std: 0.00000
2021-06-21 16:51:23 - train model - INFO - epoch 2, mean: 0.67353, std: 0.00000
2021-06-21 16:51:23 - train model - INFO - ***** Training complete *****

我用的google/bert的预训练模型，参见 #24 ，推测是你的预训练模型有问题，之前就有人用 OpenCLaP 的 bert 出现问题，参见

8

YZW-explorer commented 3 years ago

我是从您给我的链接这里下载的 https://huggingface.co/bert-base-chinese/tree/main 可能还是和google/bert的预训练模型不一样，我尝试自己转换一下，谢谢啦！

padeoe / cail2019

使用数据增广后报错 #25

8