liuwei1206 / LEBERT

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"
336 stars 60 forks source link

IndexError: list index out of range #32

Closed bultiful closed 2 years ago

bultiful commented 2 years ago

你好,感谢你的开源。我在做do_evaluate和do_predict时,不管是你提供的数据,还是我自己的数据,都会出现错误,不知道这个问题时什么导致的,想请教一下你。

Traceback (most recent call last): File "Trainer.py", line 598, in main() File "Trainer.py", line 574, in main train(model, args, train_dataset, dev_dataset, test_dataset, label_vocab, tbwriter) File "Trainer.py", line 377, in train metrics, = evaluate(model, args, dev_dataset, label_vocab, global_step, description="Dev", write_file=True) File "Trainer.py", line 465, in evaluate all_label_ids, all_predict_ids, all_attention_mask, label_vocab) File "LEBERT/function/metrics.py", line 40, in seq_f1_with_mask tmp_pred.append(label_vocab.convert_id_to_item(all_pred_labels[i][j]).replace("M-", "I-")) File "LEBERT/feature/vocab.py", line 84, in convert_id_to_item return self.idx2item[id] IndexError: list index out of range

iamlockelightning commented 2 years ago

+1 👀

bultiful commented 2 years ago

我打印出来,看到vocab.py中convert_id_to_item中id会超过idx2item列表的范围,你去Trainer.py中第526行的label_vocab。你打印一下label_vocab.init_vocab(),你会发现item2idx和idx2item会重复进列表,导致列表超出范围

liuwei1206 commented 2 years ago

你好,感谢你的开源。我在做do_evaluate和do_predict时,不管是你提供的数据,还是我自己的数据,都会出现错误,不知道这个问题时什么导致的,想请教一下你。

Traceback (most recent call last): File "Trainer.py", line 598, in main() File "Trainer.py", line 574, in main train(model, args, train_dataset, dev_dataset, test_dataset, label_vocab, tbwriter) File "Trainer.py", line 377, in train metrics, = evaluate(model, args, dev_dataset, label_vocab, global_step, description="Dev", write_file=True) File "Trainer.py", line 465, in evaluate all_label_ids, all_predict_ids, all_attention_mask, label_vocab) File "LEBERT/function/metrics.py", line 40, in seq_f1_with_mask tmp_pred.append(label_vocab.convert_id_to_item(all_pred_labels[i][j]).replace("M-", "I-")) File "LEBERT/feature/vocab.py", line 84, in convert_id_to_item return self.idx2item[id] IndexError: list index out of range

Hi, when I opensource the code, I had checked it works fine (on all datasets). So have you printed the label_vocab's items to check if labels in the class are ok? Can you show me on which dataset this error happens and what the value of out of range index is?

liuwei1206 commented 2 years ago

我打印出来,看到vocab.py中convert_id_to_item中id会超过idx2item列表的范围,你去Trainer.py中第526行的label_vocab。你打印一下label_vocab.init_vocab(),你会发现item2idx和idx2item会重复进列表,导致列表超出范围

Hi,

item2idx is a dictionary while idx2item is a list, they totally different. So I don't understand what "went into the list repeatedly" means.

iamlockelightning commented 2 years ago

你好,感谢你的开源。我在做do_evaluate和do_predict时,不管是你提供的数据,还是我自己的数据,都会出现错误,不知道这个问题时什么导致的,想请教一下你。 Traceback (most recent call last): File "Trainer.py", line 598, in main() File "Trainer.py", line 574, in main train(model, args, train_dataset, dev_dataset, test_dataset, label_vocab, tbwriter) File "Trainer.py", line 377, in train metrics, = evaluate(model, args, dev_dataset, label_vocab, global_step, description="Dev", write_file=True) File "Trainer.py", line 465, in evaluate all_label_ids, all_predict_ids, all_attention_mask, label_vocab) File "LEBERT/function/metrics.py", line 40, in seq_f1_with_mask tmp_pred.append(label_vocab.convert_id_to_item(all_pred_labels[i][j]).replace("M-", "I-")) File "LEBERT/feature/vocab.py", line 84, in convert_id_to_item return self.idx2item[id] IndexError: list index out of range

Hi, when I opensource the code, I had checked it works fine (on all datasets). So have you printed the label_vocab's items to check if labels in the class are ok? Can you show me on which dataset this error happens and what the value of out of range index is?

I tried to use your pre-trained checkpoint(weibo/pytorch_model.bin, https://drive.google.com/file/d/1HP-Fc06dMN1jqxoRivLwtAJvQm3MG64Y/view?usp=sharing) to do_eval and do_predict. But failed.

The error-index is 28, on do_predict stage.

id: 28
self.idx2item): ['O', 'B-PER.NOM', 'E-PER.NOM', 'B-LOC.NAM', 'E-LOC.NAM', 'B-PER.NAM', 'I-PER.NAM', 'E-PER.NAM', 'S-PER.NOM', 'B-GPE.NAM', 'E-GPE.NAM', 'B-ORG.NAM', 'I-ORG.NAM', 'E-ORG.NAM', 'I-PER.NOM', 'S-GPE.NAM', 'B-ORG.NOM', 'E-ORG.NOM', 'I-LOC.NAM', 'I-ORG.NOM', 'B-LOC.NOM', 'I-LOC.NOM', 'E-LOC.NOM', 'B-GPE.NOM', 'E-GPE.NOM', 'I-GPE.NAM', 'S-PER.NAM', 'S-LOC.NOM']    len(self.idx2item): 28
self.item2idx: {'O': 0, 'B-PER.NOM': 1, 'E-PER.NOM': 2, 'B-LOC.NAM': 3, 'E-LOC.NAM': 4, 'B-PER.NAM': 5, 'I-PER.NAM': 6, 'E-PER.NAM': 7, 'S-PER.NOM': 8, 'B-GPE.NAM': 9, 'E-GPE.NAM': 10, 'B-ORG.NAM': 11, 'I-ORG.NAM': 12, 'E-ORG.NAM': 13, 'I-PER.NOM': 14, 'S-GPE.NAM': 15, 'B-ORG.NOM': 16, 'E-ORG.NOM': 17, 'I-LOC.NAM': 18, 'I-ORG.NOM': 19, 'B-LOC.NOM': 20, 'I-LOC.NOM': 21, 'E-LOC.NOM': 22, 'B-GPE.NOM': 23, 'E-GPE.NOM': 24, 'I-GPE.NAM': 25, 'S-PER.NAM': 26, 'S-LOC.NOM': 27}   len(self.item2idx): 28

the label_vocab's items seem good, but the index still reached the boundary.

the running script is:

TRANSFORMERS_OFFLINE=1 CUDA_VISIBLE_DEVICES=3 python3 -m torch.distributed.launch --master_port 13117 --nproc_per_node=1 \
       Trainer.py --do_eval --do_predict \
                  --evaluate_during_training \
                  --data_dir="data/dataset/NER/weibo" \
                  --output_dir="data/result/NER/weibo/wcbertcrf" \
                  --config_name="data/berts/bert/config.json" \
                  --model_name_or_path="data/../weibo/pytorch_model.bin" \
                  --vocab_file="data/berts/bert/vocab.txt" \
                  --word_vocab_file="data/vocab/tencent_vocab.txt" \
                  --max_scan_num=1500000 \
                  --max_word_num=5 \
                  --label_file="data/dataset/NER/weibo/labels.txt" \
                  --word_embedding="data/embedding/word_embedding.txt" \
                  --saved_embedding_dir="data/dataset/NER/weibo" \
                  --model_type="WCBertCRF_Token" \
                  --seed=106524 \
                  --per_gpu_train_batch_size=4 \
                  --per_gpu_eval_batch_size=16 \
                  --learning_rate=1e-5 \
                  --max_steps=-1 \
                  --max_seq_length=256 \
                  --num_train_epochs=20 \
                  --warmup_steps=190 \
                  --save_steps=600 \
                  --logging_steps=100
iamlockelightning commented 2 years ago

besides, the eval results are weird also:

INFO:__main__:*** Dev Evaluate ***
核对data/dataset/NER/weibo/dev.json中id和词是否匹配:
[ 101 1366 5579 3971 4550 1217  677 6821 4381 2692]
['[CLS]', '口', '腔', '溃', '疡', '加', '上', '这', '玩', '意']
[0 0 0 0 0]
['<pad>', '<pad>', '<pad>', '<pad>', '<pad>']
[8938 8939    0    0    0]
['口腔', '口腔溃疡', '<pad>', '<pad>', '<pad>']
[8938 8939    0    0    0]
['口腔', '口腔溃疡', '<pad>', '<pad>', '<pad>']
[ 8939 20943     0     0     0]
['口腔溃疡', '溃疡', '<pad>', '<pad>', '<pad>']
[ 8939 20943     0     0     0]
['口腔溃疡', '溃疡', '<pad>', '<pad>', '<pad>']
[7863    0    0    0    0]
['加上', '<pad>', '<pad>', '<pad>', '<pad>']
[7863    0    0    0    0]
['加上', '<pad>', '<pad>', '<pad>', '<pad>']
[28002 28003     0     0     0]
['这玩', '这玩意', '<pad>', '<pad>', '<pad>']
[28002 28003 21877     0     0]
['这玩', '这玩意', '玩意', '<pad>', '<pad>']
[28003 21877     0     0     0]
['这玩意', '玩意', '<pad>', '<pad>', '<pad>']
Dataset length:  271
2021-09-26 20:13:49:INFO: ***** Running dev *****
INFO:__main__:***** Running dev *****
2021-09-26 20:13:49:INFO:   Num examples = 271
INFO:__main__:  Num examples = 271
2021-09-26 20:13:49:INFO:   Batch size = 16
INFO:__main__:  Batch size = 16
dev: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 18.01it/s]
Calling BertTokenizer.from_pretrained() with the path to a single file or url is deprecated
Dev Result: acc: 0.0025, p: 0.0001, r: 0.0051, f1: 0.0003
bultiful commented 2 years ago

我也遇到了这种问题。训练时做evaluate_during_training时,在验证集和测试集效果非常的好,但单独做do_predict时效果差到没啥分数,不知道什么原因。 @liuwei1206

iamlockelightning commented 2 years ago

If training from the start, the evaluation results are strange. The acc seems good, but p, r, f1 are 0 all the time.

Epoch:   5%|█████▊                                                                                                             | 1/20 [01:59<37:59, 120.00s/it2021-09-26 20:57:43:INFO: {'loss': 131.08227142333985, 'learning_rate': 9.680365296803654e-06, 'epoch': 1, 'step': 400}        | 61/338 [00:19<01:29,  3.09it/s]
INFO:__main__:{'loss': 131.08227142333985, 'learning_rate': 9.680365296803654e-06, 'epoch': 1, 'step': 400}
                                                                                                                                                              2021-09-26 20:58:14:INFO: {'loss': 128.31120971679687, 'learning_rate': 9.528158295281584e-06, 'epoch': 1, 'step': 500}       | 161/338 [00:50<00:53,  3.29it/s]
INFO:__main__:{'loss': 128.31120971679687, 'learning_rate': 9.528158295281584e-06, 'epoch': 1, 'step': 500}
                                                                                                                                                              2021-09-26 20:58:46:INFO: {'loss': 129.44464721679688, 'learning_rate': 9.375951293759512e-06, 'epoch': 1, 'step': 600}       | 261/338 [01:22<00:24,  3.13it/s]
INFO:__main__:{'loss': 129.44464721679688, 'learning_rate': 9.375951293759512e-06, 'epoch': 1, 'step': 600}
Iteration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 338/338 [01:49<00:00,  3.10it/s]
Dataset length:  271█████████████████████████████████████████████████████████████████████████████████████████████████████████| 338/338 [01:49<00:00,  1.78it/s]
2021-09-26 20:59:15:INFO: ***** Running Dev *****
INFO:__main__:***** Running Dev *****
2021-09-26 20:59:15:INFO:   Num examples = 271
INFO:__main__:  Num examples = 271
2021-09-26 20:59:15:INFO:   Batch size = 16
INFO:__main__:  Batch size = 16
Dev: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 17.53it/s]
Calling BertTokenizer.from_pretrained() with the path to a single file or url is deprecated█████████████████████████████       | 16/17 [00:00<00:00, 17.70it/s]
#############  Dev's result  #############
2021-09-26 20:59:16:INFO: {'acc': 0.933076021779585, 'p': 0.0, 'r': 0.0, 'f1': 0.0, 'epoch': 1, 'step': 676}
INFO:__main__:{'acc': 0.933076021779585, 'p': 0.0, 'r': 0.0, 'f1': 0.0, 'epoch': 1, 'step': 676}
Dataset length:  271
2021-09-26 20:59:16:INFO: ***** Running Test *****
INFO:__main__:***** Running Test *****
2021-09-26 20:59:16:INFO:   Num examples = 271
INFO:__main__:  Num examples = 271
2021-09-26 20:59:16:INFO:   Batch size = 16
INFO:__main__:  Batch size = 16
Test: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:01<00:00, 16.78it/s]
Calling BertTokenizer.from_pretrained() with the path to a single file or url is deprecated█████████████████████████████▏      | 16/17 [00:00<00:00, 16.74it/s]
#############  Test's result  #############
2021-09-26 20:59:18:INFO: {'acc': 0.927368279207654, 'p': 0.0, 'r': 0.0, 'f1': 0.0, 'epoch': 1, 'step': 676}
INFO:__main__:{'acc': 0.927368279207654, 'p': 0.0, 'r': 0.0, 'f1': 0.0, 'epoch': 1, 'step': 676}
Epoch:  10%|███████████▌                                                                                                       | 2/20 [03:53<34:55, 116.43s/it2021-09-26 20:59:39:INFO: {'loss': 135.22491577148438, 'learning_rate': 9.223744292237442e-06, 'epoch': 2, 'step': 700}        | 23/338 [00:20<04:46,  1.10it/s]
INFO:__main__:{'loss': 135.22491577148438, 'learning_rate': 9.223744292237442e-06, 'epoch': 2, 'step': 700}
                                                                                                                                                              2021-09-26 21:00:22:INFO: {'loss': 109.16646240234375, 'learning_rate': 9.071537290715373e-06, 'epoch': 2, 'step': 800}       | 123/338 [01:03<01:31,  2.34it/s]
INFO:__main__:{'loss': 109.16646240234375, 'learning_rate': 9.071537290715373e-06, 'epoch': 2, 'step': 800}
                                                                                                                                                              2021-09-26 21:01:03:INFO: {'loss': 109.31468872070313, 'learning_rate': 8.919330289193303e-06, 'epoch': 2, 'step': 900}       | 223/338 [01:44<00:49,  2.34it/s]
INFO:__main__:{'loss': 109.31468872070313, 'learning_rate': 8.919330289193303e-06, 'epoch': 2, 'step': 900}
liuwei1206 commented 2 years ago

I am sorry that I didn't encounter those problems.

The out of range prediction may due to the usage of CRF. In CRF, we add two special labels, including and . So you can see the hidden classifier is label_num + 2. Usually, it will not predict those two special labels if the model is trained normally. So maybe the training processing is wrong? You can check it!

iamlockelightning commented 2 years ago

I am sorry that I didn't encounter those problems.

The out of range prediction may due to the usage of CRF. In CRF, we add two special labels, including and . So you can see the hidden classifier is label_num + 2. Usually, it will not predict those two special labels if the model is trained normally. So maybe the training processing is wrong? You can check it!

Thank you for your patient reply.

You're correct, after training enough loops, the out-of-range index problem is gone.

But I am still not able to get a normal evaluation result. I am using the shell script that you provided.

The p, r, f1 are all 0s in the first 10 epochs. When training ends, Dev Result: acc: 0.9387, p: 0.6508, r: 0.1054, f1: 0.1814 Test Result: acc: 0.9301, p: 0.4333, r: 0.0622, f1: 0.1088

the whole training log is attached below. lebert_training_log.txt

The recall is so low. Have you encountered this phenomenon?

liuwei1206 commented 2 years ago

Hi,

I didn't encounter this phenomenon. As I said before, I have checked the performance before I open source the code and checkpoint. I am sorry I don't know the exact reason for your errors. But according to your too low results, I am wondering if you load the BERT checkpoint and word embedding successfully?

bultiful commented 2 years ago

I am sorry that I didn't encounter those problems. The out of range prediction may due to the usage of CRF. In CRF, we add two special labels, including and . So you can see the hidden classifier is label_num + 2. Usually, it will not predict those two special labels if the model is trained normally. So maybe the training processing is wrong? You can check it!

Thank you for your patient reply.

You're correct, after training enough loops, the out-of-range index problem is gone.

But I am still not able to get a normal evaluation result. I am using the shell script that you provided.

The p, r, f1 are all 0s in the first 10 epochs. When training ends, Dev Result: acc: 0.9387, p: 0.6508, r: 0.1054, f1: 0.1814 Test Result: acc: 0.9301, p: 0.4333, r: 0.0622, f1: 0.1088

the whole training log is attached below. lebert_training_log.txt

The recall is so low. Have you encountered this phenomenon?

Hi, have you solved the problem yet. Could you show me how to solve it

lvjiujin commented 2 years ago

我也遇到了这种问题。训练时做evaluate_during_training时,在验证集和测试集效果非常的好,但单独做do_predict时效果差到没啥分数,不知道什么原因。 @liuwei1206

为何我在训练时候的evaluate_during_training的时候,在验证集和测试集效果都差到几乎没有分数,为之奈何?

Yesgo1220 commented 2 years ago

I am sorry that I didn't encounter those problems. The out of range prediction may due to the usage of CRF. In CRF, we add two special labels, including and . So you can see the hidden classifier is label_num + 2. Usually, it will not predict those two special labels if the model is trained normally. So maybe the training processing is wrong? You can check it!

Thank you for your patient reply. You're correct, after training enough loops, the out-of-range index problem is gone. But I am still not able to get a normal evaluation result. I am using the shell script that you provided. The p, r, f1 are all 0s in the first 10 epochs. When training ends, Dev Result: acc: 0.9387, p: 0.6508, r: 0.1054, f1: 0.1814 Test Result: acc: 0.9301, p: 0.4333, r: 0.0622, f1: 0.1088 the whole training log is attached below. lebert_training_log.txt The recall is so low. Have you encountered this phenomenon?

Hi, have you solved the problem yet. Could you show me how to solve it

在do_predict中加上model.load_state_dict(torch.load(pytorch_model.bin训练的模型))就ok,可以试试

Yesgo1220 commented 2 years ago

I am sorry that I didn't encounter those problems. The out of range prediction may due to the usage of CRF. In CRF, we add two special labels, including and . So you can see the hidden classifier is label_num + 2. Usually, it will not predict those two special labels if the model is trained normally. So maybe the training processing is wrong? You can check it!

Thank you for your patient reply.

You're correct, after training enough loops, the out-of-range index problem is gone.

But I am still not able to get a normal evaluation result. I am using the shell script that you provided.

The p, r, f1 are all 0s in the first 10 epochs. When training ends, Dev Result: acc: 0.9387, p: 0.6508, r: 0.1054, f1: 0.1814 Test Result: acc: 0.9301, p: 0.4333, r: 0.0622, f1: 0.1088

the whole training log is attached below. lebert_training_log.txt

The recall is so low. Have you encountered this phenomenon?

https://github.com/liuwei1206/LEBERT/issues/32#issuecomment-996503455

Yesgo1220 commented 2 years ago

我也遇到了这种问题。训练时做evaluate_during_training时,在验证集和测试集效果非常的好,但单独做do_predict时效果差到没啥分数,不知道什么原因。 @liuwei1206

为何我在训练时候的evaluate_during_training的时候,在验证集和测试集效果都差到几乎没有分数,为之奈何?

https://github.com/liuwei1206/LEBERT/issues/32#issuecomment-996503455

bultiful commented 2 years ago

采用单gpu训练时不要用分布式训练。参数取消掉,应该能解决问题

---Original--- From: @.> Date: Fri, Dec 17, 2021 15:43 PM To: @.>; Cc: @.**@.>; Subject: Re: [liuwei1206/LEBERT] IndexError: list index out of range (#32)

我也遇到了这种问题。训练时做evaluate_during_training时,在验证集和测试集效果非常的好,但单独做do_predict时效果差到没啥分数,不知道什么原因。 @liuwei1206

为何我在训练时候的evaluate_during_training的时候,在验证集和测试集效果都差到几乎没有分数,为之奈何?

32 (comment)

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

TongCYJ commented 2 years ago

你好,感谢你的开源。我在做do_evaluate和do_predict时,不管是你提供的数据,还是我自己的数据,都会出现错误,不知道这个问题时什么导致的,想请教一下你。

回溯(最近一次调用):文件"Trainer.py",第598行,在main()文件"Trainer.py",第574行,在主列车(model,args,train_dataset,dev_dataset,test_dataset,label_vocab,tbwriter)文件"Trainer.py",第377行,在训练指标中, = evaluate(model,args,dev_dataset,label_vocab,global_step,description="Dev",write_file=True)文件"Trainer.py",第465行,在评估中 all_label_ids, all_predict_ids, all_attention_mask, label_vocab) 文件 "LEBERT/function/metrics.py", 第 40 行, seq_f1_with_mask tmp_pred.append(label_vocab.convert_id_to_item(all_pred_labels[i][j]).replace("M-", "I-")) File "LEBERT/feature/vocab.py", 第 84 行, convert_id_to_item return self.idx2item[id] IndexError: list index out of range

我也遇到了这个问题,能否请问你是如何解决这个问题的?

bultiful commented 2 years ago

您发给我的信件已经收到!Best Regards!

lijia2019310 commented 2 years ago

我也遇到了这种问题。训练时做evaluate_during_training时,在验证集和测试集效果非常的好,但单独做do_predict时效果差到没啥分数,不知道什么原因。 @liuwei1206

你好,请问你解决了吗? 是哪儿有问题呢,麻烦告知一下

bultiful commented 2 years ago

您发给我的信件已经收到!Best Regards!