huminghao16 / RE3QA

Retrieve, Read, Rerank: Towards End-to-End Multi-Document Reading Comprehension
Apache License 2.0
105 stars 23 forks source link

em: 43.321, f1: 65.053 #8

Open Gabriellamin opened 5 years ago

Gabriellamin commented 5 years ago

Reader em: 43.321, f1: 65.053.The em is lower nearly 30 percentage and the f1 is lower nearly 20 percentage than the paper's result.That's why?I have only changed the batch and n_para_train. I display the performance.txt and parameter.txt and the log information below. performance.txt Ranker, type: distill, step: 0, map: 0.491, mrr: 0.510, top_1: 0.312, top_3: 0.605, top_5: 0.815, top_7: 0.935, retrieval_rate: 0.468

Ranker, type: test, step: 0, map: 0.396, mrr: 0.415, top_1: 0.231, top_3: 0.468, top_5: 0.671, top_7: 0.804, retrieval_rate: 0.345

Ranker, type: distill, step: 0, map: 0.767, mrr: 0.774, top_1: 0.621, top_3: 0.934, top_5: 0.995, top_7: 0.999, retrieval_rate: 0.915

Ranker, type: test, step: 0, map: 0.690, mrr: 0.696, top_1: 0.546, top_3: 0.854, top_5: 0.916, top_7: 0.921, retrieval_rate: 0.894

Ranker, type: distill, step: 0, map: 0.767, mrr: 0.774, top_1: 0.621, top_3: 0.934, top_5: 0.995, top_7: 0.999, retrieval_rate: 0.915

Ranker, type: test, step: 0, map: 0.690, mrr: 0.696, top_1: 0.546, top_3: 0.854, top_5: 0.916, top_7: 0.921, retrieval_rate: 0.894

Ranker, type: distill, step: 0, map: 0.974, mrr: 0.974, top_1: 0.951, top_3: 0.999, top_5: 1.000, top_7: 1.000, retrieval_rate: 0.953

Ranker, type: test, step: 0, map: 0.836, mrr: 0.837, top_1: 0.813, top_3: 0.862, top_5: 0.864, top_7: 0.864, retrieval_rate: 0.929

Ranker, type: distill, step: 0, map: 0.767, mrr: 0.774, top_1: 0.621, top_3: 0.934, top_5: 0.995, top_7: 0.999, retrieval_rate: 0.915

Ranker, type: test, step: 0, map: 0.396, mrr: 0.415, top_1: 0.231, top_3: 0.468, top_5: 0.671, top_7: 0.804, retrieval_rate: 0.345

Ranker, step: 70551, map: 0.866, mrr: 0.890, top_1: 0.841, top_3: 0.935, top_5: 0.955, top_7: 0.961 Reader, step: 70551, em: 24.021, f1: 39.743

Ranker, type: distill, step: 70551, map: 0.979, mrr: 0.983, top_1: 0.969, top_3: 0.999, top_5: 1.000, top_7: 1.000, retrieval_rate: 0.915

Ranker, type: test, step: 70551, map: 0.866, mrr: 0.890, top_1: 0.841, top_3: 0.935, top_5: 0.955, top_7: 0.961, retrieval_rate: 0.345

Ranker, step: 141102, map: 0.873, mrr: 0.894, top_1: 0.850, top_3: 0.935, top_5: 0.955, top_7: 0.961 Reader, step: 141102, em: 42.810, f1: 64.606

Ranker, type: test, step: 141102, map: 0.857, mrr: 0.890, top_1: 0.837, top_3: 0.933, top_5: 0.963, top_7: 0.972, retrieval_rate: 0.223

Reader, type: test, step: 141102, em: 43.321, f1: 65.053

参数设置为: ablate_type: none bert_config_file: ../../data/bert-base-uncased/bert_config.json data_dir: ../../data/squad1 data_parallel: False debug: False do_lower_case: True do_predict: True do_predict_open: False do_train: True doc_stride: 128 down_sample: False filter_type: em fp16: False gradient_accumulation_steps: 1 init_checkpoint: ../../data/bert-base-uncased/pytorch_model.bin learning_rate: 3e-05 length_heuristic: 0.05 local_rank: -1 loss_scale: 128 max_answer_length: 30 max_query_length: 64 max_seq_length: 384 n_best_size_rank: 4 n_best_size_read: 20 n_para_predict: 10 n_para_train: 4 no_cuda: False num_hidden_rank: 3 num_train_epochs: 2.0 optimize_on_cpu: False output_dir: out/squad_doc/011 pred_rank_weight: 1.4 pred_rerank_weight: 1.4 predict_batch_size: 4 predict_file: dev-v1.1.json rank_pred_file: None rank_train_file: None sample_rate: 1.0 seed: 42 train_batch_size: 4 train_file: train-v1.1.json verbose_logging: False vocab_file: ../../data/bert-base-uncased/vocab.txt warmup_proportion: 0.05 log信息为: root@9953e6052f70:/workspace/pythonprogram_zm/RE3QA/bert# python3 run_squad_document_full_e2e.py 08/21/2019 09:12:33 - INFO - main - output_dir: out/squad_doc/011 08/21/2019 09:12:33 - INFO - main - torch_version: 0.4.1 device: cuda n_gpu: 1, distributed training: False, 16-bits training: False 08/21/2019 09:12:33 - INFO - main - Preparing model 08/21/2019 09:12:36 - INFO - main - Loading model from pretrained checkpoint: ../../data/bert-base-uncased/pytorch_model.bin 08/21/2019 09:12:38 - INFO - main - Weights of BertForRankingAndReadingAndReranking not initialized from pretrained model: ['rank_affine.weight', 'rank_affine.bias', 'rank_dense.weight', 'rank_dense.bias', 'rank_classifier.weight', 'rank_classifier.bias', 'read_affine.weight', 'read_affine.bias', 'rerank_affine.weight', 'rerank_affine.bias', 'rerank_dense.weight', 'rerank_dense.bias', 'rerank_classifier.weight', 'rerank_classifier.bias'] 08/21/2019 09:12:38 - INFO - main - Weights from pretrained model not used in BertForRankingAndReadingAndReranking: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.gamma', 'cls.predictions.transform.LayerNorm.beta', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias'] 08/21/2019 09:12:41 - INFO - main - Preparing training Recall of answer existence in documents: 0.922 Average length of documents: 4986.023 Average pruned length of documents: 484.074 08/21/2019 09:13:15 - INFO - main - Processing features: 5000 08/21/2019 09:13:32 - INFO - main - Processing features: 10000 08/21/2019 09:13:49 - INFO - main - Processing features: 15000 08/21/2019 09:14:08 - INFO - main - Processing features: 20000 此处省略 08/21/2019 09:29:23 - INFO - main - Processing features: 280000 08/21/2019 09:29:40 - INFO - main - Processing features: 285000 08/21/2019 09:29:56 - INFO - main - Processing features: 290000 08/21/2019 09:30:13 - INFO - main - Processing features: 295000 08/21/2019 09:30:30 - INFO - main - Processing features: 300000 08/21/2019 09:30:46 - INFO - main - Processing features: 305000 08/21/2019 09:31:30 - INFO - main - Filtering features randomly 08/21/2019 09:31:31 - INFO - main - Num orig examples = 87599 08/21/2019 09:31:31 - INFO - main - Num split features = 308278 08/21/2019 09:31:31 - INFO - main - Num split filtered features = 219303 08/21/2019 09:31:31 - INFO - main - Batch size for ranker = 22 08/21/2019 09:31:31 - INFO - main - Batch size for reader = 16 08/21/2019 09:31:31 - INFO - main - Num steps = 27412 08/21/2019 09:31:40 - INFO - main - Preparing evaluation Recall of answer existence in documents: 0.923 Average length of documents: 5287.083 Average pruned length of documents: 509.538 08/21/2019 09:32:00 - INFO - main - Processing features: 5000 08/21/2019 09:32:18 - INFO - main - Processing features: 10000 08/21/2019 09:32:39 - INFO - main - Processing features: 15000 08/21/2019 09:32:55 - INFO - main - Processing features: 20000 08/21/2019 09:33:11 - INFO - main - Processing features: 25000 08/21/2019 09:33:28 - INFO - main - Processing features: 30000 08/21/2019 09:33:44 - INFO - main - Processing features: 35000 08/21/2019 09:34:05 - INFO - main - Filtering features randomly 08/21/2019 09:34:05 - INFO - main - Num orig examples = 10570 08/21/2019 09:34:05 - INFO - main - Num split features = 39769 08/21/2019 09:34:05 - INFO - main - Num split filtered features = 35546 08/21/2019 09:34:05 - INFO - main - Batch size for ranker = 64 08/21/2019 09:34:05 - INFO - main - Batch size for reader = 32 08/21/2019 09:34:06 - INFO - main - Running training distillation 08/21/2019 09:34:06 - INFO - main - Processing example: 0 08/21/2019 09:37:52 - INFO - main - Processing example: 55000 08/21/2019 09:41:37 - INFO - main - Processing example: 110000 08/21/2019 09:45:23 - INFO - main - Processing example: 165000 08/21/2019 09:49:08 - INFO - main - Processing example: 220000 08/21/2019 09:52:55 - INFO - main - Processing example: 275000 08/21/2019 09:55:30 - INFO - main - Reconstruct training data at distill_4paras_4best.pkl 08/21/2019 09:55:30 - INFO - main - Filtering features based on: out/squad_doc/011/distill_4paras_4best.pkl 08/21/2019 10:05:21 - INFO - main - Num orig examples = 87599 08/21/2019 10:05:21 - INFO - main - Num split features = 308278 08/21/2019 10:05:21 - INFO - main - Num split filtered features = 282203 08/21/2019 10:05:21 - INFO - main - Batch size for ranker = 17 08/21/2019 10:05:21 - INFO - main - Batch size for reader = 16 08/21/2019 10:05:21 - INFO - main - Num steps = 35275 08/21/2019 10:05:31 - INFO - main - Running eval distillation 08/21/2019 10:05:31 - INFO - main - Processing example: 0 08/21/2019 10:08:09 - INFO - main - Reconstruct eval data at test_4paras_4best.pkl 08/21/2019 10:08:09 - INFO - main - Filtering features based on: out/squad_doc/011/test_4paras_4best.pkl 08/21/2019 10:08:09 - INFO - main - Num orig examples = 10570 08/21/2019 10:08:09 - INFO - main - Num split features = 39769 08/21/2019 10:08:09 - INFO - main - Num split filtered features = 35546 08/21/2019 10:08:09 - INFO - main - Batch size for ranker = 64 08/21/2019 10:08:09 - INFO - main - Batch size for reader = 32 08/21/2019 10:08:10 - INFO - main - Preparing optimizer 08/21/2019 10:08:10 - INFO - main - Running training 08/21/2019 10:08:10 - INFO - main - Epoch: 1 Traceback (most recent call last): File "run_squad_document_full_e2e.py", line 914, in main() File "run_squad_document_full_e2e.py", line 857, in main save_path, best_f1, epoch) File "run_squad_document_full_e2e.py", line 491, in run_train_epoch input_ids=input_ids, token_type_ids=segment_ids) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/custom_modeling.py", line 225, in forward all_encoderlayers, = self.bert(self.num_hidden_read, input_ids, token_type_ids, attention_mask) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/custom_modeling.py", line 165, in forward all_encoder_layers = self.encoder(num_hidden_stop, embedding_output, extended_attention_mask) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/custom_modeling.py", line 130, in forward hidden_states = layer_module(hidden_states, attention_mask) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/modeling.py", line 274, in forward layer_output = self.output(intermediate_output, attention_output) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/modeling.py", line 260, in forward hidden_states = self.LayerNorm(hidden_states + input_tensor) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/modeling.py", line 127, in forward return self.gamma * x + self.beta RuntimeError: CUDA error: out of memory root@9953e6052f70:/workspace/pythonprogram_zm/RE3QA/bert# ls pycache custom_modeling.py modeling.py optimization.py out run_squad_document_full_e2e.py run_triviaqa_wiki_full_e2e.py tokenization.py root@9953e6052f70:/workspace/pythonprogram_zm/RE3QA/bert# vim run_squad_document_full_e2e.py root@9953e6052f70:/workspace/pythonprogram_zm/RE3QA/bert# python3 run_squad_document_full_e2e.py 08/21/2019 10:41:35 - INFO - main - output_dir: out/squad_doc/011 08/21/2019 10:41:35 - INFO - main - torch_version: 0.4.1 device: cuda n_gpu: 1, distributed training: False, 16-bits training: False 08/21/2019 10:41:35 - INFO - main - Preparing model 08/21/2019 10:41:38 - INFO - main - Loading model from pretrained checkpoint: ../../data/bert-base-uncased/pytorch_model.bin 08/21/2019 10:41:39 - INFO - main - Weights of BertForRankingAndReadingAndReranking not initialized from pretrained model: ['rank_affine.weight', 'rank_affine.bias', 'rank_dense.weight', 'rank_dense.bias', 'rank_classifier.weight', 'rank_classifier.bias', 'read_affine.weight', 'read_affine.bias', 'rerank_affine.weight', 'rerank_affine.bias', 'rerank_dense.weight', 'rerank_dense.bias', 'rerank_classifier.weight', 'rerank_classifier.bias'] 08/21/2019 10:41:39 - INFO - main - Weights from pretrained model not used in BertForRankingAndReadingAndReranking: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.gamma', 'cls.predictions.transform.LayerNorm.beta', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias'] 08/21/2019 10:41:41 - INFO - main - Preparing training 08/21/2019 10:41:44 - INFO - main - Loading examples from: ../../data/squad1/train_4paras_examples.pkl 08/21/2019 10:42:33 - INFO - main - Loading features from: ../../data/squad1/train_4paras_384max_128stride_features.pkl 08/21/2019 10:42:33 - INFO - main - Filtering features randomly 08/21/2019 10:42:34 - INFO - main - Num orig examples = 87599 08/21/2019 10:42:34 - INFO - main - Num split features = 308278 08/21/2019 10:42:34 - INFO - main - Num split filtered features = 219303 08/21/2019 10:42:34 - INFO - main - Batch size for ranker = 11 08/21/2019 10:42:34 - INFO - main - Batch size for reader = 8 08/21/2019 10:42:34 - INFO - main - Num steps = 54825 08/21/2019 10:42:44 - INFO - main - Preparing evaluation 08/21/2019 10:42:48 - INFO - main - Loading examples from: ../../data/squad1/eval_4paras_examples.pkl 08/21/2019 10:42:53 - INFO - main - Loading features from: ../../data/squad1/eval_4paras_384max_128stride_features.pkl 08/21/2019 10:42:53 - INFO - main - Filtering features randomly 08/21/2019 10:42:53 - INFO - main - Num orig examples = 10570 08/21/2019 10:42:53 - INFO - main - Num split features = 39769 08/21/2019 10:42:53 - INFO - main - Num split filtered features = 35546 08/21/2019 10:42:53 - INFO - main - Batch size for ranker = 16 08/21/2019 10:42:53 - INFO - main - Batch size for reader = 8 08/21/2019 10:42:54 - INFO - main - Running training distillation 08/21/2019 10:42:54 - INFO - main - Processing example: 0 08/21/2019 10:46:48 - INFO - main - Processing example: 55000 08/21/2019 10:50:43 - INFO - main - Processing example: 110000 08/21/2019 10:54:37 - INFO - main - Processing example: 165000 08/21/2019 10:58:31 - INFO - main - Processing example: 220000 08/21/2019 11:02:33 - INFO - main - Processing example: 275000 08/21/2019 11:05:08 - INFO - main - Reconstruct training data at distill_4paras_4best.pkl 08/21/2019 11:05:08 - INFO - main - Filtering features based on: out/squad_doc/011/distill_4paras_4best.pkl 08/21/2019 11:15:03 - INFO - main - Num orig examples = 87599 08/21/2019 11:15:03 - INFO - main - Num split features = 308278 08/21/2019 11:15:03 - INFO - main - Num split filtered features = 282203 08/21/2019 11:15:03 - INFO - main - Batch size for ranker = 8 08/21/2019 11:15:03 - INFO - main - Batch size for reader = 8 08/21/2019 11:15:03 - INFO - main - Num steps = 70550 08/21/2019 11:15:15 - INFO - main - Running eval distillation 08/21/2019 11:15:15 - INFO - main - Processing example: 0 08/21/2019 11:15:57 - INFO - main - Processing example: 10000 08/21/2019 11:16:39 - INFO - main - Processing example: 20000 08/21/2019 11:17:21 - INFO - main - Processing example: 30000 08/21/2019 11:18:04 - INFO - main - Reconstruct eval data at test_4paras_4best.pkl 08/21/2019 11:18:04 - INFO - main - Filtering features based on: out/squad_doc/011/test_4paras_4best.pkl 08/21/2019 11:18:04 - INFO - main - Num orig examples = 10570 08/21/2019 11:18:04 - INFO - main - Num split features = 39769 08/21/2019 11:18:04 - INFO - main - Num split filtered features = 35546 08/21/2019 11:18:04 - INFO - main - Batch size for ranker = 16 08/21/2019 11:18:04 - INFO - main - Batch size for reader = 8 08/21/2019 11:18:06 - INFO - main - Preparing optimizer 08/21/2019 11:18:06 - INFO - main - Running training 08/21/2019 11:18:06 - INFO - main - Epoch: 1 Traceback (most recent call last): File "run_squad_document_full_e2e.py", line 914, in main() File "run_squad_document_full_e2e.py", line 857, in main save_path, best_f1, epoch) File "run_squad_document_full_e2e.py", line 491, in run_train_epoch input_ids=input_ids, token_type_ids=segment_ids) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/custom_modeling.py", line 225, in forward all_encoderlayers, = self.bert(self.num_hidden_read, input_ids, token_type_ids, attention_mask) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/custom_modeling.py", line 165, in forward all_encoder_layers = self.encoder(num_hidden_stop, embedding_output, extended_attention_mask) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/custom_modeling.py", line 130, in forward hidden_states = layer_module(hidden_states, attention_mask) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/modeling.py", line 272, in forward attention_output = self.attention(hidden_states, attention_mask) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/modeling.py", line 233, in forward self_output = self.self(input_tensor, attention_mask) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/workspace/pythonprogram_zm/RE3QA/bert/modeling.py", line 194, in forward attention_scores = attention_scores / math.sqrt(self.attention_head_size) RuntimeError: CUDA error: out of memory root@9953e6052f70:/workspace/pythonprogram_zm/RE3QA/bert# vim run_squad_document_full_e2e.py root@9953e6052f70:/workspace/pythonprogram_zm/RE3QA/bert# python3 run_squad_document_full_e2e.py 08/21/2019 11:24:04 - INFO - main - output_dir: out/squad_doc/011 08/21/2019 11:24:04 - INFO - main - torch_version: 0.4.1 device: cuda n_gpu: 1, distributed training: False, 16-bits training: False 08/21/2019 11:24:04 - INFO - main - Preparing model 08/21/2019 11:24:07 - INFO - main - Loading model from pretrained checkpoint: ../../data/bert-base-uncased/pytorch_model.bin 08/21/2019 11:24:07 - INFO - main - Weights of BertForRankingAndReadingAndReranking not initialized from pretrained model: ['rank_affine.bias', 'rank_affine.weight', 'rank_dense.bias', 'rank_dense.weight', 'rank_classifier.bias', 'rank_classifier.weight', 'read_affine.bias', 'read_affine.weight', 'rerank_affine.bias', 'rerank_affine.weight', 'rerank_dense.bias', 'rerank_dense.weight', 'rerank_classifier.bias', 'rerank_classifier.weight'] 08/21/2019 11:24:07 - INFO - main - Weights from pretrained model not used in BertForRankingAndReadingAndReranking: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.gamma', 'cls.predictions.transform.LayerNorm.beta', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias'] 08/21/2019 11:24:10 - INFO - main - Preparing training Recall of answer existence in documents: 0.859 Average length of documents: 4986.023 Average pruned length of documents: 239.409 08/21/2019 11:24:51 - INFO - main - Processing features: 5000 08/21/2019 11:25:15 - INFO - main - Processing features: 10000 08/21/2019 11:25:37 - INFO - main - Processing features: 15000 08/21/2019 11:25:59 - INFO - main - Processing features: 20000 此处省略 08/21/2019 11:32:09 - INFO - main - Processing features: 95000 08/21/2019 11:32:34 - INFO - main - Processing features: 100000 08/21/2019 11:32:59 - INFO - main - Processing features: 105000 08/21/2019 11:33:24 - INFO - main - Processing features: 110000 08/21/2019 11:33:54 - INFO - main - Filtering features randomly 08/21/2019 11:33:55 - INFO - main - Num orig examples = 87599 08/21/2019 11:33:55 - INFO - main - Num split features = 113423 08/21/2019 11:33:55 - INFO - main - Num split filtered features = 98764 08/21/2019 11:33:55 - INFO - main - Batch size for ranker = 4 08/21/2019 11:33:55 - INFO - main - Batch size for reader = 4 08/21/2019 11:33:55 - INFO - main - Num steps = 49382 08/21/2019 11:33:59 - INFO - main - Preparing evaluation Recall of answer existence in documents: 0.864 Average length of documents: 5287.083 Average pruned length of documents: 252.107 08/21/2019 11:34:26 - INFO - main - Processing features: 5000 08/21/2019 11:34:49 - INFO - main - Processing features: 10000 08/21/2019 11:35:14 - INFO - main - Filtering features randomly 08/21/2019 11:35:14 - INFO - main - Num orig examples = 10570 08/21/2019 11:35:14 - INFO - main - Num split features = 14456 08/21/2019 11:35:14 - INFO - main - Num split filtered features = 13433 08/21/2019 11:35:14 - INFO - main - Batch size for ranker = 8 08/21/2019 11:35:14 - INFO - main - Batch size for reader = 4 08/21/2019 11:35:14 - INFO - main - Running training distillation 08/21/2019 11:35:14 - INFO - main - Processing example: 0 08/21/2019 11:35:38 - INFO - main - Processing example: 5000 08/21/2019 11:36:02 - INFO - main - Processing example: 10000 08/21/2019 11:36:25 - INFO - main - Processing example: 15000 08/21/2019 11:36:49 - INFO - main - Processing example: 20000 08/21/2019 11:37:13 - INFO - main - Processing example: 25000 08/21/2019 11:37:37 - INFO - main - Processing example: 30000 此处省略 08/21/2019 11:43:33 - INFO - main - Processing example: 105000 08/21/2019 11:43:57 - INFO - main - Processing example: 110000 08/21/2019 11:44:19 - INFO - main - Reconstruct training data at distill_2paras_2best.pkl 08/21/2019 11:44:19 - INFO - main - Filtering features based on: out/squad_doc/011/distill_2paras_2best.pkl 08/21/2019 11:45:39 - INFO - main - Num orig examples = 87599 08/21/2019 11:45:39 - INFO - main - Num split features = 113423 08/21/2019 11:45:39 - INFO - main - Num split filtered features = 108130 08/21/2019 11:45:39 - INFO - main - Batch size for ranker = 4 08/21/2019 11:45:39 - INFO - main - Batch size for reader = 4 08/21/2019 11:45:39 - INFO - main - Num steps = 54065 08/21/2019 11:45:43 - INFO - main - Running eval distillation 08/21/2019 11:45:43 - INFO - main - Processing example: 0 08/21/2019 11:46:06 - INFO - main - Processing example: 5000 08/21/2019 11:46:27 - INFO - main - Processing example: 10000 08/21/2019 11:46:48 - INFO - main - Reconstruct eval data at test_2paras_2best.pkl 08/21/2019 11:46:48 - INFO - main - Filtering features based on: out/squad_doc/011/test_2paras_2best.pkl 08/21/2019 11:46:48 - INFO - main - Num orig examples = 10570 08/21/2019 11:46:48 - INFO - main - Num split features = 14456 08/21/2019 11:46:48 - INFO - main - Num split filtered features = 13433 08/21/2019 11:46:48 - INFO - main - Batch size for ranker = 8 08/21/2019 11:46:48 - INFO - main - Batch size for reader = 4 08/21/2019 11:46:48 - INFO - main - Preparing optimizer 08/21/2019 11:46:48 - INFO - main - Running training 08/21/2019 11:46:48 - INFO - main - Epoch: 1 Traceback (most recent call last): File "run_squad_document_full_e2e.py", line 914, in main() File "run_squad_document_full_e2e.py", line 857, in main save_path, best_f1, epoch) File "run_squad_document_full_e2e.py", line 509, in run_train_epoch args.verbose_logging, logger) File "/workspace/pythonprogram_zm/RE3QA/squad/squad_document_utils.py", line 1099, in annotate_candidates assert len(span_starts) == int(n_best_size/4) AssertionError root@9953e6052f70:/workspace/pythonprogram_zm/RE3QA/bert# vim run_squad_document_full_e2e.py root@9953e6052f70:/workspace/pythonprogram_zm/RE3QA/bert# python3 run_squad_document_full_e2e.py 08/21/2019 12:10:19 - INFO - main - output_dir: out/squad_doc/011 08/21/2019 12:10:19 - INFO - main - torch_version: 0.4.1 device: cuda n_gpu: 1, distributed training: False, 16-bits training: False 08/21/2019 12:10:19 - INFO - main - Preparing model 08/21/2019 12:10:22 - INFO - main - Loading model from pretrained checkpoint: ../../data/bert-base-uncased/pytorch_model.bin 08/21/2019 12:10:23 - INFO - main - Weights of BertForRankingAndReadingAndReranking not initialized from pretrained model: ['rank_affine.weight', 'rank_affine.bias', 'rank_dense.weight', 'rank_dense.bias', 'rank_classifier.weight', 'rank_classifier.bias', 'read_affine.weight', 'read_affine.bias', 'rerank_affine.weight', 'rerank_affine.bias', 'rerank_dense.weight', 'rerank_dense.bias', 'rerank_classifier.weight', 'rerank_classifier.bias'] 08/21/2019 12:10:23 - INFO - main - Weights from pretrained model not used in BertForRankingAndReadingAndReranking: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.gamma', 'cls.predictions.transform.LayerNorm.beta', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias'] 08/21/2019 12:10:26 - INFO - main - Preparing training 08/21/2019 12:10:29 - INFO - main - Loading examples from: ../../data/squad1/train_4paras_examples.pkl 08/21/2019 12:11:21 - INFO - main - Loading features from: ../../data/squad1/train_4paras_384max_128stride_features.pkl 08/21/2019 12:11:21 - INFO - main - Filtering features randomly 08/21/2019 12:11:22 - INFO - main - Num orig examples = 87599 08/21/2019 12:11:22 - INFO - main - Num split features = 308278 08/21/2019 12:11:22 - INFO - main - Num split filtered features = 219303 08/21/2019 12:11:22 - INFO - main - Batch size for ranker = 5 08/21/2019 12:11:22 - INFO - main - Batch size for reader = 4 08/21/2019 12:11:22 - INFO - main - Num steps = 109651 08/21/2019 12:11:31 - INFO - main - Preparing evaluation 08/21/2019 12:11:37 - INFO - main - Loading examples from: ../../data/squad1/eval_10paras_examples.pkl 08/21/2019 12:11:57 - INFO - main - Loading features from: ../../data/squad1/eval_10paras_384max_128stride_features.pkl 08/21/2019 12:11:57 - INFO - main - Filtering features randomly 08/21/2019 12:11:57 - INFO - main - Num orig examples = 10570 08/21/2019 12:11:57 - INFO - main - Num split features = 122413 08/21/2019 12:11:57 - INFO - main - Num split filtered features = 42279 08/21/2019 12:11:57 - INFO - main - Batch size for ranker = 8 08/21/2019 12:11:57 - INFO - main - Batch size for reader = 4 08/21/2019 12:12:00 - INFO - main - Running training distillation 08/21/2019 12:12:00 - INFO - main - Processing example: 0 08/21/2019 12:12:23 - INFO - main - Processing example: 5000 08/21/2019 12:12:46 - INFO - main - Processing example: 10000 08/21/2019 12:13:10 - INFO - main - Processing example: 15000 此处省略 08/21/2019 12:33:57 - INFO - main - Processing example: 285000 08/21/2019 12:34:20 - INFO - main - Processing example: 290000 08/21/2019 12:34:43 - INFO - main - Processing example: 295000 08/21/2019 12:35:06 - INFO - main - Processing example: 300000 08/21/2019 12:35:29 - INFO - main - Processing example: 305000 08/21/2019 12:36:07 - INFO - main - Reconstruct training data at distill_4paras_4best.pkl 08/21/2019 12:36:07 - INFO - main - Filtering features based on: out/squad_doc/011/distill_4paras_4best.pkl 08/21/2019 12:45:35 - INFO - main - Num orig examples = 87599 08/21/2019 12:45:35 - INFO - main - Num split features = 308278 08/21/2019 12:45:35 - INFO - main - Num split filtered features = 282203 08/21/2019 12:45:35 - INFO - main - Batch size for ranker = 4 08/21/2019 12:45:35 - INFO - main - Batch size for reader = 4 08/21/2019 12:45:35 - INFO - main - Num steps = 141101 08/21/2019 12:45:46 - INFO - main - Running eval distillation 08/21/2019 12:45:46 - INFO - main - Processing example: 0 08/21/2019 12:46:09 - INFO - main - Processing example: 5000 08/21/2019 12:46:31 - INFO - main - Processing example: 10000 08/21/2019 12:46:52 - INFO - main - Processing example: 15000 此处省略 08/21/2019 12:53:49 - INFO - main - Processing example: 110000 08/21/2019 12:54:11 - INFO - main - Processing example: 115000 08/21/2019 12:54:33 - INFO - main - Processing example: 120000 08/21/2019 12:54:46 - INFO - main - Reconstruct eval data at test_10paras_4best.pkl 08/21/2019 12:54:46 - INFO - main - Filtering features based on: out/squad_doc/011/test_10paras_4best.pkl 08/21/2019 12:54:47 - INFO - main - Num orig examples = 10570 08/21/2019 12:54:47 - INFO - main - Num split features = 122413 08/21/2019 12:54:47 - INFO - main - Num split filtered features = 42279 08/21/2019 12:54:47 - INFO - main - Batch size for ranker = 8 08/21/2019 12:54:47 - INFO - main - Batch size for reader = 4 08/21/2019 12:54:50 - INFO - main - Preparing optimizer 08/21/2019 12:54:50 - INFO - main - Running training 08/21/2019 12:54:50 - INFO - main - Epoch: 1 08/21/2019 13:02:21 - INFO - main - step: 1000, loss: 14.307 08/21/2019 13:09:40 - INFO - main - step: 2000, loss: 4.402 08/21/2019 13:17:01 - INFO - main - step: 3000, loss: 3.800 08/21/2019 13:24:17 - INFO - main - step: 4000, loss: 3.560 08/21/2019 13:31:30 - INFO - main - step: 5000, loss: 3.400 08/21/2019 13:38:53 - INFO - main - step: 6000, loss: 3.300 08/21/2019 13:46:21 - INFO - main - step: 7000, loss: 3.208 08/21/2019 13:53:43 - INFO - main - step: 8000, loss: 3.120 08/21/2019 14:01:13 - INFO - main - step: 9000, loss: 3.022 08/21/2019 14:08:39 - INFO - main - step: 10000, loss: 2.898 08/21/2019 14:16:05 - INFO - main - step: 11000, loss: 2.879 08/21/2019 14:23:31 - INFO - main - step: 12000, loss: 2.861 08/21/2019 14:31:04 - INFO - main - step: 13000, loss: 2.877 08/21/2019 14:38:29 - INFO - main - step: 14000, loss: 2.821 08/21/2019 14:46:01 - INFO - main - step: 15000, loss: 2.798 08/21/2019 14:53:32 - INFO - main - step: 16000, loss: 2.801 08/21/2019 15:01:00 - INFO - main - step: 17000, loss: 2.756 08/21/2019 15:08:36 - INFO - main - step: 18000, loss: 2.731 08/21/2019 15:16:12 - INFO - main - step: 19000, loss: 2.651 08/21/2019 15:23:41 - INFO - main - step: 20000, loss: 2.735 08/21/2019 15:31:05 - INFO - main - step: 21000, loss: 2.614 08/21/2019 15:38:31 - INFO - main - step: 22000, loss: 2.635 08/21/2019 15:45:57 - INFO - main - step: 23000, loss: 2.608 08/21/2019 15:53:18 - INFO - main - step: 24000, loss: 2.615 08/21/2019 16:00:43 - INFO - main - step: 25000, loss: 2.568 08/21/2019 16:08:21 - INFO - main - step: 26000, loss: 2.565 08/21/2019 16:15:51 - INFO - main - step: 27000, loss: 2.588 08/21/2019 16:23:15 - INFO - main - step: 28000, loss: 2.616 08/21/2019 16:30:39 - INFO - main - step: 29000, loss: 2.577 08/21/2019 16:38:04 - INFO - main - step: 30000, loss: 2.565 08/21/2019 16:45:23 - INFO - main - step: 31000, loss: 2.576 08/21/2019 16:52:43 - INFO - main - step: 32000, loss: 2.489 08/21/2019 16:59:49 - INFO - main - step: 33000, loss: 2.465 08/21/2019 17:07:11 - INFO - main - step: 34000, loss: 2.465 08/21/2019 17:14:48 - INFO - main - step: 35000, loss: 2.517 08/21/2019 17:22:05 - INFO - main - step: 36000, loss: 2.523 08/21/2019 17:29:20 - INFO - main - step: 37000, loss: 2.416 08/21/2019 17:36:41 - INFO - main - step: 38000, loss: 2.402 08/21/2019 17:43:49 - INFO - main - step: 39000, loss: 2.466 08/21/2019 17:51:17 - INFO - main - step: 40000, loss: 2.440 08/21/2019 17:58:43 - INFO - main - step: 41000, loss: 2.356 08/21/2019 18:06:08 - INFO - main - step: 42000, loss: 2.407 08/21/2019 18:13:27 - INFO - main - step: 43000, loss: 2.418 08/21/2019 18:20:35 - INFO - main - step: 44000, loss: 2.343 08/21/2019 18:27:41 - INFO - main - step: 45000, loss: 2.349 08/21/2019 18:34:48 - INFO - main - step: 46000, loss: 2.369 08/21/2019 18:41:54 - INFO - main - step: 47000, loss: 2.316 08/21/2019 18:49:00 - INFO - main - step: 48000, loss: 2.334 08/21/2019 18:56:06 - INFO - main - step: 49000, loss: 2.225 08/21/2019 19:03:12 - INFO - main - step: 50000, loss: 2.347 08/21/2019 19:10:28 - INFO - main - step: 51000, loss: 2.323 08/21/2019 19:17:49 - INFO - main - step: 52000, loss: 2.261 08/21/2019 19:25:11 - INFO - main - step: 53000, loss: 2.317 08/21/2019 19:32:35 - INFO - main - step: 54000, loss: 2.259 08/21/2019 19:39:57 - INFO - main - step: 55000, loss: 2.308 08/21/2019 19:47:33 - INFO - main - step: 56000, loss: 2.299 08/21/2019 19:54:59 - INFO - main - step: 57000, loss: 2.253 08/21/2019 20:02:20 - INFO - main - step: 58000, loss: 2.262 08/21/2019 20:09:42 - INFO - main - step: 59000, loss: 2.275 08/21/2019 20:17:02 - INFO - main - step: 60000, loss: 2.261 08/21/2019 20:24:25 - INFO - main - step: 61000, loss: 2.244 08/21/2019 20:31:47 - INFO - main - step: 62000, loss: 2.209 08/21/2019 20:39:07 - INFO - main - step: 63000, loss: 2.196 08/21/2019 20:46:29 - INFO - main - step: 64000, loss: 2.232 08/21/2019 20:53:56 - INFO - main - step: 65000, loss: 2.173 08/21/2019 21:01:20 - INFO - main - step: 66000, loss: 2.172 08/21/2019 21:08:41 - INFO - main - step: 67000, loss: 2.118 08/21/2019 21:16:03 - INFO - main - step: 68000, loss: 2.163 08/21/2019 21:23:17 - INFO - main - step: 69000, loss: 2.224 08/21/2019 21:30:24 - INFO - main - step: 70000, loss: 2.207 08/21/2019 21:34:18 - INFO - main - Running ranking evaluation 08/21/2019 21:43:13 - INFO - main - Running reading evaluation missing prediction on 0 examples 08/21/2019 21:58:44 - INFO - main - Running training distillation 08/21/2019 21:58:44 - INFO - main - Processing example: 0 08/21/2019 21:59:08 - INFO - main - Processing example: 5000 08/21/2019 21:59:31 - INFO - main - Processing example: 10000 此处省略 08/21/2019 22:21:06 - INFO - main - Processing example: 285000 08/21/2019 22:21:29 - INFO - main - Processing example: 290000 08/21/2019 22:21:53 - INFO - main - Processing example: 295000 08/21/2019 22:22:16 - INFO - main - Processing example: 300000 08/21/2019 22:22:40 - INFO - main - Processing example: 305000 08/21/2019 22:23:10 - INFO - main - Reconstruct training data at distill_4paras_4best.pkl 08/21/2019 22:23:11 - INFO - main - Filtering features based on: out/squad_doc/011/distill_4paras_4best.pkl 08/21/2019 22:32:28 - INFO - main - Num orig examples = 87599 08/21/2019 22:32:28 - INFO - main - Num split features = 308278 08/21/2019 22:32:28 - INFO - main - Num split filtered features = 282203 08/21/2019 22:32:28 - INFO - main - Batch size for ranker = 4 08/21/2019 22:32:28 - INFO - main - Batch size for reader = 4 08/21/2019 22:32:28 - INFO - main - Num steps = 141101 08/21/2019 22:32:38 - INFO - main - Running eval distillation 08/21/2019 22:32:38 - INFO - main - Processing example: 0 08/21/2019 22:33:00 - INFO - main - Processing example: 5000 08/21/2019 22:33:22 - INFO - main - Processing example: 10000 08/21/2019 22:33:44 - INFO - main - Processing example: 15000 此处省略 08/21/2019 22:40:38 - INFO - main - Processing example: 110000 08/21/2019 22:41:00 - INFO - main - Processing example: 115000 08/21/2019 22:41:21 - INFO - main - Processing example: 120000 08/21/2019 22:41:34 - INFO - main - Reconstruct eval data at test_10paras_4best.pkl 08/21/2019 22:41:34 - INFO - main - Filtering features based on: out/squad_doc/011/test_10paras_4best.pkl 08/21/2019 22:41:34 - INFO - main - Num orig examples = 10570 08/21/2019 22:41:34 - INFO - main - Num split features = 122413 08/21/2019 22:41:34 - INFO - main - Num split filtered features = 42279 08/21/2019 22:41:34 - INFO - main - Batch size for ranker = 8 08/21/2019 22:41:34 - INFO - main - Batch size for reader = 4 08/21/2019 22:41:36 - INFO - main - Epoch: 2 08/21/2019 22:44:47 - INFO - main - step: 71000, loss: 1.856 08/21/2019 22:51:52 - INFO - main - step: 72000, loss: 1.839 08/21/2019 22:58:57 - INFO - main - step: 73000, loss: 1.840 08/21/2019 23:06:10 - INFO - main - step: 74000, loss: 1.831 08/21/2019 23:13:31 - INFO - main - step: 75000, loss: 1.810 08/21/2019 23:20:53 - INFO - main - step: 76000, loss: 1.753 08/21/2019 23:28:14 - INFO - main - step: 77000, loss: 1.837 08/21/2019 23:35:34 - INFO - main - step: 78000, loss: 1.844 08/21/2019 23:42:58 - INFO - main - step: 79000, loss: 1.860 08/21/2019 23:50:18 - INFO - main - step: 80000, loss: 1.831 08/21/2019 23:57:39 - INFO - main - step: 81000, loss: 1.775 08/22/2019 00:05:08 - INFO - main - step: 82000, loss: 1.808 08/22/2019 00:12:32 - INFO - main - step: 83000, loss: 1.872 08/22/2019 00:19:56 - INFO - main - step: 84000, loss: 1.813 08/22/2019 00:27:19 - INFO - main - step: 85000, loss: 1.794 08/22/2019 00:34:52 - INFO - main - step: 86000, loss: 1.812 08/22/2019 00:42:19 - INFO - main - step: 87000, loss: 1.831 08/22/2019 00:49:49 - INFO - main - step: 88000, loss: 1.810 08/22/2019 00:57:12 - INFO - main - step: 89000, loss: 1.758 08/22/2019 01:04:32 - INFO - main - step: 90000, loss: 1.796 08/22/2019 01:11:55 - INFO - main - step: 91000, loss: 1.761 08/22/2019 01:19:15 - INFO - main - step: 92000, loss: 1.807 08/22/2019 01:26:35 - INFO - main - step: 93000, loss: 1.768 08/22/2019 01:33:57 - INFO - main - step: 94000, loss: 1.782 08/22/2019 01:41:28 - INFO - main - step: 95000, loss: 1.789 08/22/2019 01:48:45 - INFO - main - step: 96000, loss: 1.727 08/22/2019 01:56:10 - INFO - main - step: 97000, loss: 1.742 08/22/2019 02:03:32 - INFO - main - step: 98000, loss: 1.717 08/22/2019 02:10:53 - INFO - main - step: 99000, loss: 1.714 08/22/2019 02:18:13 - INFO - main - step: 100000, loss: 1.762 08/22/2019 02:25:35 - INFO - main - step: 101000, loss: 1.671 08/22/2019 02:32:54 - INFO - main - step: 102000, loss: 1.700 08/22/2019 02:40:19 - INFO - main - step: 103000, loss: 1.697 08/22/2019 02:47:39 - INFO - main - step: 104000, loss: 1.708 08/22/2019 02:54:59 - INFO - main - step: 105000, loss: 1.700 08/22/2019 03:02:24 - INFO - main - step: 106000, loss: 1.690 08/22/2019 03:09:42 - INFO - main - step: 107000, loss: 1.656 08/22/2019 03:16:58 - INFO - main - step: 108000, loss: 1.710 08/22/2019 03:24:14 - INFO - main - step: 109000, loss: 1.726 08/22/2019 03:31:31 - INFO - main - step: 110000, loss: 1.696 08/22/2019 03:38:53 - INFO - main - step: 111000, loss: 1.669 08/22/2019 03:46:13 - INFO - main - step: 112000, loss: 1.707 08/22/2019 03:53:33 - INFO - main - step: 113000, loss: 1.690 08/22/2019 04:00:37 - INFO - main - step: 114000, loss: 1.669 08/22/2019 04:07:47 - INFO - main - step: 115000, loss: 1.667 08/22/2019 04:15:08 - INFO - main - step: 116000, loss: 1.692 08/22/2019 04:22:28 - INFO - main - step: 117000, loss: 1.672 08/22/2019 04:29:48 - INFO - main - step: 118000, loss: 1.602 08/22/2019 04:37:13 - INFO - main - step: 119000, loss: 1.655 08/22/2019 04:44:18 - INFO - main - step: 120000, loss: 1.634 08/22/2019 04:51:23 - INFO - main - step: 121000, loss: 1.652 08/22/2019 04:58:35 - INFO - main - step: 122000, loss: 1.617 08/22/2019 05:05:56 - INFO - main - step: 123000, loss: 1.603 08/22/2019 05:13:17 - INFO - main - step: 124000, loss: 1.590 08/22/2019 05:20:33 - INFO - main - step: 125000, loss: 1.641 08/22/2019 05:27:48 - INFO - main - step: 126000, loss: 1.644 08/22/2019 05:35:09 - INFO - main - step: 127000, loss: 1.580 08/22/2019 05:42:33 - INFO - main - step: 128000, loss: 1.649 08/22/2019 05:49:53 - INFO - main - step: 129000, loss: 1.612 08/22/2019 05:57:13 - INFO - main - step: 130000, loss: 1.560 08/22/2019 06:04:34 - INFO - main - step: 131000, loss: 1.540 08/22/2019 06:11:57 - INFO - main - step: 132000, loss: 1.603 08/22/2019 06:19:20 - INFO - main - step: 133000, loss: 1.568 08/22/2019 06:26:40 - INFO - main - step: 134000, loss: 1.557 08/22/2019 06:34:03 - INFO - main - step: 135000, loss: 1.562 08/22/2019 06:41:25 - INFO - main - step: 136000, loss: 1.581 08/22/2019 06:48:59 - INFO - main - step: 137000, loss: 1.467 08/22/2019 06:56:19 - INFO - main - step: 138000, loss: 1.592 08/22/2019 07:03:40 - INFO - main - step: 139000, loss: 1.587 08/22/2019 07:11:01 - INFO - main - step: 140000, loss: 1.574 08/22/2019 07:18:21 - INFO - main - step: 141000, loss: 1.593 08/22/2019 07:19:05 - INFO - main - Running ranking evaluation 08/22/2019 07:28:01 - INFO - main - Running reading evaluation missing prediction on 0 examples 08/22/2019 07:43:52 - INFO - main - Preparing prediction Recall of answer existence in documents: 0.990 Average length of documents: 5287.083 Average pruned length of documents: 3666.967 08/22/2019 07:44:15 - INFO - main - Processing features: 5000 08/22/2019 07:44:27 - INFO - main - Processing features: 10000 此处省略 08/22/2019 07:58:55 - INFO - main - Processing features: 370000 08/22/2019 07:59:07 - INFO - main - Processing features: 375000 08/22/2019 08:00:11 - INFO - main - Filtering features randomly 08/22/2019 08:00:12 - INFO - main - Num orig examples = 10570 08/22/2019 08:00:12 - INFO - main - Num split features = 378602 08/22/2019 08:00:12 - INFO - main - Num split filtered features = 84560 08/22/2019 08:00:12 - INFO - main - Batch size for ranker = 8 08/22/2019 08:00:12 - INFO - main - Batch size for reader = 4 08/22/2019 08:00:24 - INFO - main - Running ranking prediction 08/22/2019 08:00:25 - INFO - main - Loading model from finetuned checkpoint: 'out/squad_doc/011/checkpoint.pth.tar' (step 141102) 08/22/2019 08:00:25 - INFO - main - Processing example: 0 08/22/2019 08:00:46 - INFO - main - Processing example: 5000 08/22/2019 08:01:08 - INFO - main - Processing example: 10000 此处省略 08/22/2019 08:27:01 - INFO - main - Processing example: 365000 08/22/2019 08:27:23 - INFO - main - Processing example: 370000 08/22/2019 08:27:45 - INFO - main - Processing example: 375000 08/22/2019 08:28:05 - INFO - main - Reconstruct pred data at test_30paras_8best.pkl 08/22/2019 08:28:05 - INFO - main - Filtering features based on: out/squad_doc/011/test_30paras_8best.pkl 08/22/2019 08:28:19 - INFO - main - Num orig examples = 10570 08/22/2019 08:28:19 - INFO - main - Num split features = 378602 08/22/2019 08:28:19 - INFO - main - Num split filtered features = 84560 08/22/2019 08:28:19 - INFO - main - Batch size for ranker = 8 08/22/2019 08:28:19 - INFO - main - Batch size for reader = 4 08/22/2019 08:28:29 - INFO - main - Running reading prediction 08/22/2019 08:28:30 - INFO - main - Loading model from finetuned checkpoint: 'out/squad_doc/011/checkpoint.pth.tar' (step 141102) 08/22/2019 08:28:30 - INFO - main - Processing example: 0 08/22/2019 08:30:10 - INFO - main - Processing example: 5000 08/22/2019 08:31:50 - INFO - main - Processing example: 10000 08/22/2019 08:33:31 - INFO - main - Processing example: 15000 08/22/2019 08:35:12 - INFO - main - Processing example: 20000 08/22/2019 08:36:52 - INFO - main - Processing example: 25000 08/22/2019 08:38:33 - INFO - main - Processing example: 30000 08/22/2019 08:40:13 - INFO - main - Processing example: 35000 08/22/2019 08:41:54 - INFO - main - Processing example: 40000 08/22/2019 08:43:34 - INFO - main - Processing example: 45000 08/22/2019 08:45:14 - INFO - main - Processing example: 50000 08/22/2019 08:46:55 - INFO - main - Processing example: 55000 08/22/2019 08:48:36 - INFO - main - Processing example: 60000 08/22/2019 08:50:16 - INFO - main - Processing example: 65000 08/22/2019 08:51:56 - INFO - main - Processing example: 70000 08/22/2019 08:53:37 - INFO - main - Processing example: 75000 08/22/2019 08:55:18 - INFO - main - Processing example: 80000 08/22/2019 08:59:08 - INFO - main - Writing predictions to: out/squad_doc/011/predictions.json 08/22/2019 08:59:08 - INFO - main - Writing nbest to: out/squad_doc/011/nbest_predictions.json missing prediction on 0 examples

huminghao16 commented 5 years ago

I notice that your train_batch_size is set as 4, which is too small. In our experiment, the batch size is set as 32 by default. Maybe you should tray to increase the batch size to see if the performance can be improved.

Gabriellamin commented 5 years ago

这位同学说用了相同的参数效果也是如此https://github.com/huminghao16/RE3QA/issues/2#issuecomment-523302418

Seohyeong commented 5 years ago

@Gabriellamin Were you able to reproduce the reported results using batch size of 32?

huminghao16 commented 5 years ago

Hi Gabriellamin. I have fixed some bugs in the code! Could you please try again to see if you can reproduce the results?

Apologies for the inconvenience!

huminghao16 commented 5 years ago

I've trained the model with batch size 32 on 2 gpus with a gradient acc step of 4 on SQuAD-document. Here is what I've got.

Ranker, type: distill, step: 0, map: 0.491, mrr: 0.510, top_1: 0.312, top_3: 0.605, top_5: 0.815, top_7: 0.935, retrieval_rate: 0.468

Ranker, type: test, step: 0, map: 0.396, mrr: 0.415, top_1: 0.231, top_3: 0.468, top_5: 0.671, top_7: 0.804, retrieval_rate: 0.345

Ranker, step: 10911, map: 0.888, mrr: 0.907, top_1: 0.872, top_3: 0.939, top_5: 0.956, top_7: 0.962 Reader, step: 10911, em: 46.991, f1: 53.054

Ranker, type: distill, step: 10911, map: 0.952, mrr: 0.964, top_1: 0.941, top_3: 0.986, top_5: 0.997, top_7: 0.999, retrieval_rate: 0.468

Ranker, type: test, step: 10911, map: 0.888, mrr: 0.907, top_1: 0.872, top_3: 0.939, top_5: 0.956, top_7: 0.962, retrieval_rate: 0.345

Ranker, step: 21822, map: 0.891, mrr: 0.909, top_1: 0.876, top_3: 0.940, top_5: 0.957, top_7: 0.962 Reader, step: 21822, em: 76.500, f1: 83.243

Ranker, type: test, step: 21822, map: 0.886, mrr: 0.913, top_1: 0.874, top_3: 0.943, top_5: 0.968, top_7: 0.976, retrieval_rate: 0.223

Reader, type: test, step: 21822, em: 77.332, f1: 84.276