salesforce / ctrl-sum

Resources for the "CTRLsum: Towards Generic Controllable Text Summarization" paper
https://arxiv.org/abs/2012.04281
BSD 3-Clause "New" or "Revised" License
146 stars 24 forks source link

Tagger: bert or roberta? #2

Closed volker42maru closed 3 years ago

volker42maru commented 3 years ago

❓ Questions and Help

Hi there,

just wanted to ask if you used bert-large-cased or roberta-large to initialize the weights of the tagger (both options are in the training script).

Thanks

jxhe commented 3 years ago

Hi, we used bert-large-cased in the experiments

volker42maru commented 3 years ago

Do you plan to release the weights for the tagger as well?

I tried to train my own tagger using your code to reproduce the results for uncontrolled summarization on cnndm, but somehow the automatically extracted keywords from the tagger look very different from the oracle keywords.

Tagger examples*

An official with France 's accident investigation | Cell phones have been collected at the site , he said but that they " had n't exploited yet . 
The Palestinian Authority officially became the 123rd member of International Criminal Court on Wednesday , a step that gives court jurisdiction over alleged crimes in territories . 
The organization found " positive developments worldwide , with most regions seeming to show reductions in the number of executions . | Across board exception 

*I used the cnndm hyperparams from the paper for inference --threshold 0.25 --maximum-word 30 --summary-size 10

Oracle examples**

Marseille far videos crash | video Paris clip | school
jurisdiction crimes | war crimes Israelis | Israel United opposed
world annual report death penalty | number executions worldwide | 28 compared 2013

**I preprocessed cnndm with the script provided in this repo, but the test.oracleword results are different from the examples in example_dataset

I would greatly appreciate your input in this matter. Thank you :)

volker42maru commented 3 years ago

There was some problem during training where the loss made a jump. I trained a tagger with roberta-large now and it looks better.

(Roberta) Tagger examples

French investigation crash Germanwings 9525 video board | German 
Palestinian Authority 123rd member Criminal Court step court jurisdiction alleged crimes territories | Palestinians ICC Rome Statute January 
world terrorism executions Amnesty International alleges annual report death penalty | worldwide 22 

However, my results for uncontrolled summarization using this roberta tagger + official CTRLsum (BART) checkpoint are significantly lower than the paper reported results: 44.62 R1 ....

I will take a look at preprocessing again since my results where already different for the oracle tags.

jxhe commented 3 years ago

Hi, first, the BERT tagger example you gave is very different from ours and seems your tagger training is somewhat problematic. For example, the training labels for the tagger do not contain any stop words but the mentioned examples contain a lot of them -- these stop words should receive a very low score from the tagger. Can you post your training log of the BERT tagger for debugging?

Second, the updated Roberta tagger example looks more reasonable to me, but note that our hyperparameters of extracting the keywords is tuned on our BERT tagger, which may not be suitable for the Roberta model.

For legal reasons I cannot release the tagger weights here, please contact me personally if you want our pretrained tagger for easier reproducing: junxianh@cs.cmu.edu

volker42maru commented 3 years ago

Hi,

sure, I will attach the log output here.

Console log output (BERT tagger training) ``` 01/18/2021 04:40:19 - INFO - transformers.training_args - PyTorch: setting up devices 01/18/2021 04:40:22 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: False 01/18/2021 04:40:22 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, 16-bits training: False 01/18/2021 04:40:22 - INFO - __main__ - Training/evaluation parameters TrainingArguments(output_dir='checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=True, evaluate_during_training=True, per_device_train_batch_size=7, per_device_eval_batch_size=8, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=4, learning_rate=5e-05, weight_decay=0.01, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=20000, warmup_steps=500, logging_dir='checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2', logging_first_step=False, logging_steps=100, save_steps=2000, save_total_limit=10, no_cuda=False, seed=1, fp16=False, fp16_opt_level='O1', local_rank=0, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=1000, past_index=-1) 01/18/2021 04:40:22 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, 16-bits training: False 01/18/2021 04:40:23 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-config.json from cache at /home/volker/.cache/torch/transformers/90deb4d9dd705272dc4b3db1364d759d551d72a9f70a91f60e3a1f5e278b985d.9019d8d0ae95e32b896211ae7ae130d7c36bb19ccf35c90a9e51923309458f70 01/18/2021 04:40:23 - INFO - transformers.configuration_utils - Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "directionality": "bidi", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "id2label": { "0": "0", "1": "1" }, "initializer_range": 0.02, "intermediate_size": 4096, "label2id": { "0": 0, "1": 1 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "type_vocab_size": 2, "vocab_size": 28996 } 01/18/2021 04:40:24 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-config.json from cache at /home/volker/.cache/torch/transformers/90deb4d9dd705272dc4b3db1364d759d551d72a9f70a91f60e3a1f5e278b985d.9019d8d0ae95e32b896211ae7ae130d7c36bb19ccf35c90a9e51923309458f70 01/18/2021 04:40:24 - INFO - transformers.configuration_utils - Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "directionality": "bidi", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 0, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "type_vocab_size": 2, "vocab_size": 28996 } 01/18/2021 04:40:25 - INFO - transformers.tokenization_utils_base - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt from cache at /home/volker/.cache/torch/transformers/cee054f6aafe5e2cf816d2228704e326446785f940f5451a5b26033516a4ac3d.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1 01/18/2021 04:40:25 - INFO - transformers.modeling_utils - loading weights file https://cdn.huggingface.co/bert-large-cased-pytorch_model.bin from cache at /home/volker/.cache/torch/transformers/5f91c3ab24cfb315cf0be4174a25619f6087eb555acc8ae3a82edfff7f705138.b5f1c2070e0a0c189ca3b08270b0cb5bd0635b7319e74e93bd0dc26689953c27 01/18/2021 04:40:36 - WARNING - transformers.modeling_utils - Some weights of the model checkpoint at bert-large-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'] - This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model). - This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 01/18/2021 04:40:36 - WARNING - transformers.modeling_utils - Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-large-cased and are newly initialized: ['classifier.weight', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 01/18/2021 04:40:36 - WARNING - transformers.modeling_utils - Some weights of the model checkpoint at bert-large-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'] - This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model). - This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 01/18/2021 04:40:36 - WARNING - transformers.modeling_utils - Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-large-cased and are newly initialized: ['classifier.weight', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 01/18/2021 04:40:36 - WARNING - transformers.modeling_utils - Some weights of the model checkpoint at bert-large-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'] - This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model). - This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 01/18/2021 04:40:36 - WARNING - transformers.modeling_utils - Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-large-cased and are newly initialized: ['classifier.weight', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 01/18/2021 04:40:37 - INFO - filelock - Lock 140247007191504 acquired on /home/volker/.cache/huggingface/datasets/9756f8f80e16ee5ac069b79fe89ac2b3f5bcfb45f4b062aa7893dd2938c9c00f.de777b72eef8feede3bda87890ea212f8e91a531814ac5161b97706188ba7174.py.lock 01/18/2021 04:40:37 - INFO - filelock - Lock 140247007191504 released on /home/volker/.cache/huggingface/datasets/9756f8f80e16ee5ac069b79fe89ac2b3f5bcfb45f4b062aa7893dd2938c9c00f.de777b72eef8feede3bda87890ea212f8e91a531814ac5161b97706188ba7174.py.lock 01/18/2021 04:40:37 - INFO - filelock - Lock 140247511262736 acquired on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-cef534454dbf1256_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 04:40:37 - INFO - filelock - Lock 140247511262736 released on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-cef534454dbf1256_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 04:40:37 - INFO - filelock - Lock 140247007519120 acquired on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-cef534454dbf1256_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 04:40:37 - INFO - filelock - Lock 140247007519120 released on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-cef534454dbf1256_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 04:40:44 - INFO - filelock - Lock 140247007501776 acquired on /home/volker/.cache/huggingface/datasets/9756f8f80e16ee5ac069b79fe89ac2b3f5bcfb45f4b062aa7893dd2938c9c00f.de777b72eef8feede3bda87890ea212f8e91a531814ac5161b97706188ba7174.py.lock 01/18/2021 04:40:44 - INFO - filelock - Lock 140247007501776 released on /home/volker/.cache/huggingface/datasets/9756f8f80e16ee5ac069b79fe89ac2b3f5bcfb45f4b062aa7893dd2938c9c00f.de777b72eef8feede3bda87890ea212f8e91a531814ac5161b97706188ba7174.py.lock 01/18/2021 04:40:44 - INFO - filelock - Lock 140247007501520 acquired on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-cef534454dbf1256_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 04:40:44 - INFO - filelock - Lock 140247007501520 released on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-cef534454dbf1256_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 04:40:44 - INFO - filelock - Lock 140245873506192 acquired on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-cef534454dbf1256_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 04:40:44 - INFO - filelock - Lock 140245873506192 released on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-cef534454dbf1256_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 04:40:51 - INFO - transformers.trainer - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 01/18/2021 04:40:53 - INFO - transformers.trainer - ***** Running training ***** 01/18/2021 04:40:53 - INFO - transformers.trainer - Num examples = 491835 01/18/2021 04:40:53 - INFO - transformers.trainer - Num Epochs = 4 01/18/2021 04:40:53 - INFO - transformers.trainer - Instantaneous batch size per device = 7 01/18/2021 04:40:53 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 84 01/18/2021 04:40:53 - INFO - transformers.trainer - Gradient Accumulation steps = 4 01/18/2021 04:40:53 - INFO - transformers.trainer - Total optimization steps = 20000 01/18/2021 04:45:10 - INFO - transformers.trainer - {'loss': 0.21096447808668017, 'learning_rate': 1e-05, 'epoch': 0.017078690064472057, 'step': 100} 01/18/2021 04:49:27 - INFO - transformers.trainer - {'loss': 0.09895485420711339, 'learning_rate': 2e-05, 'epoch': 0.03415738012894411, 'step': 200} 01/18/2021 04:53:43 - INFO - transformers.trainer - {'loss': 0.0940618761163205, 'learning_rate': 3e-05, 'epoch': 0.05123607019341617, 'step': 300} 01/18/2021 04:58:00 - INFO - transformers.trainer - {'loss': 0.09268501854501665, 'learning_rate': 4e-05, 'epoch': 0.06831476025788823, 'step': 400} 01/18/2021 05:02:16 - INFO - transformers.trainer - {'loss': 0.09178590215742588, 'learning_rate': 5e-05, 'epoch': 0.08539345032236027, 'step': 500} 01/18/2021 05:06:33 - INFO - transformers.trainer - {'loss': 0.09243604611605406, 'learning_rate': 4.9743589743589746e-05, 'epoch': 0.10247214038683233, 'step': 600} 01/18/2021 05:10:48 - INFO - transformers.trainer - {'loss': 0.08993079602718353, 'learning_rate': 4.948717948717949e-05, 'epoch': 0.11955083045130438, 'step': 700} 01/18/2021 05:15:04 - INFO - transformers.trainer - {'loss': 0.0898141351621598, 'learning_rate': 4.923076923076924e-05, 'epoch': 0.13662952051577645, 'step': 800} 01/18/2021 05:19:20 - INFO - transformers.trainer - {'loss': 0.09084519593045115, 'learning_rate': 4.8974358974358975e-05, 'epoch': 0.1537082105802485, 'step': 900} 01/18/2021 05:23:36 - INFO - transformers.trainer - {'loss': 0.09126668662764131, 'learning_rate': 4.871794871794872e-05, 'epoch': 0.17078690064472055, 'step': 1000} 01/18/2021 05:23:36 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 05:23:36 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 05:23:36 - INFO - transformers.trainer - Batch size = 8 01/18/2021 05:26:04 - INFO - transformers.trainer - {'eval_loss': 0.09099291790392806, 'epoch': 0.17078690064472055, 'step': 1000} 01/18/2021 05:30:20 - INFO - transformers.trainer - {'loss': 0.08963804192841053, 'learning_rate': 4.846153846153846e-05, 'epoch': 0.1878655907091926, 'step': 1100} 01/18/2021 05:34:37 - INFO - transformers.trainer - {'loss': 0.08881811023689806, 'learning_rate': 4.8205128205128205e-05, 'epoch': 0.20494428077366467, 'step': 1200} 01/18/2021 05:38:54 - INFO - transformers.trainer - {'loss': 0.09011077161878348, 'learning_rate': 4.7948717948717955e-05, 'epoch': 0.2220229708381367, 'step': 1300} 01/18/2021 05:43:12 - INFO - transformers.trainer - {'loss': 0.08849841949529946, 'learning_rate': 4.76923076923077e-05, 'epoch': 0.23910166090260876, 'step': 1400} 01/18/2021 05:47:29 - INFO - transformers.trainer - {'loss': 0.0887072384916246, 'learning_rate': 4.7435897435897435e-05, 'epoch': 0.2561803509670808, 'step': 1500} 01/18/2021 05:51:47 - INFO - transformers.trainer - {'loss': 0.08874277316965162, 'learning_rate': 4.717948717948718e-05, 'epoch': 0.2732590410315529, 'step': 1600} 01/18/2021 05:56:04 - INFO - transformers.trainer - {'loss': 0.08732443751767277, 'learning_rate': 4.692307692307693e-05, 'epoch': 0.29033773109602495, 'step': 1700} 01/18/2021 06:00:20 - INFO - transformers.trainer - {'loss': 0.0885152863431722, 'learning_rate': 4.666666666666667e-05, 'epoch': 0.307416421160497, 'step': 1800} 01/18/2021 06:04:37 - INFO - transformers.trainer - {'loss': 0.08763288768008351, 'learning_rate': 4.6410256410256415e-05, 'epoch': 0.32449511122496905, 'step': 1900} 01/18/2021 06:08:54 - INFO - transformers.trainer - {'loss': 0.0882954747416079, 'learning_rate': 4.615384615384616e-05, 'epoch': 0.3415738012894411, 'step': 2000} 01/18/2021 06:08:54 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 06:08:54 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 06:08:54 - INFO - transformers.trainer - Batch size = 8 01/18/2021 06:11:31 - INFO - transformers.trainer - {'eval_loss': 0.08877651933200505, 'epoch': 0.3415738012894411, 'step': 2000} 01/18/2021 06:11:31 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-2000 01/18/2021 06:11:31 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-2000/config.json 01/18/2021 06:11:33 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-2000/pytorch_model.bin 01/18/2021 06:15:54 - INFO - transformers.trainer - {'loss': 0.08765322201885282, 'learning_rate': 4.5897435897435895e-05, 'epoch': 0.35865249135391314, 'step': 2100} 01/18/2021 06:20:10 - INFO - transformers.trainer - {'loss': 0.08777894601225852, 'learning_rate': 4.5641025641025645e-05, 'epoch': 0.3757311814183852, 'step': 2200} 01/18/2021 06:24:25 - INFO - transformers.trainer - {'loss': 0.08740998072549701, 'learning_rate': 4.538461538461539e-05, 'epoch': 0.3928098714828573, 'step': 2300} 01/18/2021 06:28:42 - INFO - transformers.trainer - {'loss': 0.08714552055113017, 'learning_rate': 4.512820512820513e-05, 'epoch': 0.40988856154732933, 'step': 2400} 01/18/2021 06:32:58 - INFO - transformers.trainer - {'loss': 0.08816747311502696, 'learning_rate': 4.4871794871794874e-05, 'epoch': 0.4269672516118014, 'step': 2500} 01/18/2021 06:37:15 - INFO - transformers.trainer - {'loss': 0.08700557777658105, 'learning_rate': 4.461538461538462e-05, 'epoch': 0.4440459416762734, 'step': 2600} 01/18/2021 06:41:31 - INFO - transformers.trainer - {'loss': 0.08719984491821378, 'learning_rate': 4.435897435897436e-05, 'epoch': 0.46112463174074547, 'step': 2700} 01/18/2021 06:45:48 - INFO - transformers.trainer - {'loss': 0.08748536392115057, 'learning_rate': 4.4102564102564104e-05, 'epoch': 0.4782033218052175, 'step': 2800} 01/18/2021 06:50:04 - INFO - transformers.trainer - {'loss': 0.08723237507976592, 'learning_rate': 4.384615384615385e-05, 'epoch': 0.4952820118696896, 'step': 2900} 01/18/2021 06:54:21 - INFO - transformers.trainer - {'loss': 0.08813060995191335, 'learning_rate': 4.358974358974359e-05, 'epoch': 0.5123607019341616, 'step': 3000} 01/18/2021 06:54:21 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 06:54:21 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 06:54:21 - INFO - transformers.trainer - Batch size = 8 01/18/2021 06:56:55 - INFO - transformers.trainer - {'eval_loss': 0.08799208083643181, 'epoch': 0.5123607019341616, 'step': 3000} 01/18/2021 07:01:11 - INFO - transformers.trainer - {'loss': 0.08670589179731905, 'learning_rate': 4.3333333333333334e-05, 'epoch': 0.5294393919986337, 'step': 3100} 01/18/2021 07:05:26 - INFO - transformers.trainer - {'loss': 0.08612008104100824, 'learning_rate': 4.3076923076923084e-05, 'epoch': 0.5465180820631058, 'step': 3200} 01/18/2021 07:09:42 - INFO - transformers.trainer - {'loss': 0.0879831974580884, 'learning_rate': 4.282051282051282e-05, 'epoch': 0.5635967721275779, 'step': 3300} 01/18/2021 07:13:57 - INFO - transformers.trainer - {'loss': 0.08907603794708847, 'learning_rate': 4.2564102564102564e-05, 'epoch': 0.5806754621920499, 'step': 3400} 01/18/2021 07:18:13 - INFO - transformers.trainer - {'loss': 0.08673659248277545, 'learning_rate': 4.230769230769231e-05, 'epoch': 0.597754152256522, 'step': 3500} 01/18/2021 07:22:29 - INFO - transformers.trainer - {'loss': 0.08707203573547304, 'learning_rate': 4.205128205128206e-05, 'epoch': 0.614832842320994, 'step': 3600} 01/18/2021 07:26:46 - INFO - transformers.trainer - {'loss': 0.11208854168653488, 'learning_rate': 4.17948717948718e-05, 'epoch': 0.631911532385466, 'step': 3700} 01/18/2021 07:31:04 - INFO - transformers.trainer - {'loss': 0.1319634577818215, 'learning_rate': 4.1538461538461544e-05, 'epoch': 0.6489902224499381, 'step': 3800} 01/18/2021 07:35:21 - INFO - transformers.trainer - {'loss': 0.1329947100020945, 'learning_rate': 4.128205128205128e-05, 'epoch': 0.6660689125144101, 'step': 3900} 01/18/2021 07:39:37 - INFO - transformers.trainer - {'loss': 0.13094629300758243, 'learning_rate': 4.1025641025641023e-05, 'epoch': 0.6831476025788822, 'step': 4000} 01/18/2021 07:39:37 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 07:39:37 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 07:39:37 - INFO - transformers.trainer - Batch size = 8 01/18/2021 07:42:08 - INFO - transformers.trainer - {'eval_loss': 0.14145558321834506, 'epoch': 0.6831476025788822, 'step': 4000} 01/18/2021 07:42:08 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-4000 01/18/2021 07:42:08 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-4000/config.json 01/18/2021 07:42:10 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-4000/pytorch_model.bin 01/18/2021 07:46:31 - INFO - transformers.trainer - {'loss': 0.13099097307771446, 'learning_rate': 4.0769230769230773e-05, 'epoch': 0.7002262926433542, 'step': 4100} 01/18/2021 07:50:48 - INFO - transformers.trainer - {'loss': 0.12946705030277372, 'learning_rate': 4.051282051282052e-05, 'epoch': 0.7173049827078263, 'step': 4200} 01/18/2021 07:55:05 - INFO - transformers.trainer - {'loss': 0.1299473760277033, 'learning_rate': 4.025641025641026e-05, 'epoch': 0.7343836727722983, 'step': 4300} 01/18/2021 07:59:22 - INFO - transformers.trainer - {'loss': 0.130179364643991, 'learning_rate': 4e-05, 'epoch': 0.7514623628367704, 'step': 4400} 01/18/2021 08:03:39 - INFO - transformers.trainer - {'loss': 0.13114809673279523, 'learning_rate': 3.974358974358974e-05, 'epoch': 0.7685410529012425, 'step': 4500} 01/18/2021 08:07:56 - INFO - transformers.trainer - {'loss': 0.13181761171668768, 'learning_rate': 3.948717948717949e-05, 'epoch': 0.7856197429657146, 'step': 4600} 01/18/2021 08:12:13 - INFO - transformers.trainer - {'loss': 0.13061218542978167, 'learning_rate': 3.923076923076923e-05, 'epoch': 0.8026984330301866, 'step': 4700} 01/18/2021 08:16:30 - INFO - transformers.trainer - {'loss': 0.1316672115959227, 'learning_rate': 3.8974358974358976e-05, 'epoch': 0.8197771230946587, 'step': 4800} 01/18/2021 08:20:47 - INFO - transformers.trainer - {'loss': 0.13079099521040916, 'learning_rate': 3.871794871794872e-05, 'epoch': 0.8368558131591307, 'step': 4900} 01/18/2021 08:25:04 - INFO - transformers.trainer - {'loss': 0.12970148329623044, 'learning_rate': 3.846153846153846e-05, 'epoch': 0.8539345032236028, 'step': 5000} 01/18/2021 08:25:04 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 08:25:04 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 08:25:04 - INFO - transformers.trainer - Batch size = 8 01/18/2021 08:27:40 - INFO - transformers.trainer - {'eval_loss': 0.14288417849325596, 'epoch': 0.8539345032236028, 'step': 5000} 01/18/2021 08:31:57 - INFO - transformers.trainer - {'loss': 0.13115603158250452, 'learning_rate': 3.8205128205128206e-05, 'epoch': 0.8710131932880748, 'step': 5100} 01/18/2021 08:36:14 - INFO - transformers.trainer - {'loss': 0.13109397692605854, 'learning_rate': 3.794871794871795e-05, 'epoch': 0.8880918833525469, 'step': 5200} 01/18/2021 08:40:31 - INFO - transformers.trainer - {'loss': 0.1331454561278224, 'learning_rate': 3.769230769230769e-05, 'epoch': 0.9051705734170189, 'step': 5300} 01/18/2021 08:44:47 - INFO - transformers.trainer - {'loss': 0.1288531650416553, 'learning_rate': 3.7435897435897436e-05, 'epoch': 0.9222492634814909, 'step': 5400} 01/18/2021 08:49:04 - INFO - transformers.trainer - {'loss': 0.13060957124456762, 'learning_rate': 3.717948717948718e-05, 'epoch': 0.939327953545963, 'step': 5500} 01/18/2021 08:53:21 - INFO - transformers.trainer - {'loss': 0.13312898749485613, 'learning_rate': 3.692307692307693e-05, 'epoch': 0.956406643610435, 'step': 5600} 01/18/2021 08:57:38 - INFO - transformers.trainer - {'loss': 0.13133331623859704, 'learning_rate': 3.6666666666666666e-05, 'epoch': 0.9734853336749071, 'step': 5700} 01/18/2021 09:01:55 - INFO - transformers.trainer - {'loss': 0.1314236806333065, 'learning_rate': 3.641025641025641e-05, 'epoch': 0.9905640237393792, 'step': 5800} 01/18/2021 09:06:12 - INFO - transformers.trainer - {'loss': 0.13126584332436322, 'learning_rate': 3.615384615384615e-05, 'epoch': 1.0076854105290125, 'step': 5900} 01/18/2021 09:10:29 - INFO - transformers.trainer - {'loss': 0.13152201274409892, 'learning_rate': 3.58974358974359e-05, 'epoch': 1.0247641005934844, 'step': 6000} 01/18/2021 09:10:29 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 09:10:29 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 09:10:29 - INFO - transformers.trainer - Batch size = 8 01/18/2021 09:13:02 - INFO - transformers.trainer - {'eval_loss': 0.14206206369204757, 'epoch': 1.0247641005934844, 'step': 6000} 01/18/2021 09:13:02 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-6000 01/18/2021 09:13:02 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-6000/config.json 01/18/2021 09:13:04 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-6000/pytorch_model.bin 01/18/2021 09:17:25 - INFO - transformers.trainer - {'loss': 0.13224126258865, 'learning_rate': 3.5641025641025646e-05, 'epoch': 1.0418427906579566, 'step': 6100} 01/18/2021 09:21:42 - INFO - transformers.trainer - {'loss': 0.13052283477969467, 'learning_rate': 3.538461538461539e-05, 'epoch': 1.0589214807224285, 'step': 6200} 01/18/2021 09:26:00 - INFO - transformers.trainer - {'loss': 0.1307653716020286, 'learning_rate': 3.5128205128205125e-05, 'epoch': 1.0760001707869007, 'step': 6300} 01/18/2021 09:30:17 - INFO - transformers.trainer - {'loss': 0.12876513510942458, 'learning_rate': 3.487179487179487e-05, 'epoch': 1.0930788608513726, 'step': 6400} 01/18/2021 09:34:34 - INFO - transformers.trainer - {'loss': 0.13026353703811766, 'learning_rate': 3.461538461538462e-05, 'epoch': 1.1101575509158448, 'step': 6500} 01/18/2021 09:38:51 - INFO - transformers.trainer - {'loss': 0.13211099933832884, 'learning_rate': 3.435897435897436e-05, 'epoch': 1.1272362409803167, 'step': 6600} 01/18/2021 09:43:07 - INFO - transformers.trainer - {'loss': 0.12818788773380219, 'learning_rate': 3.4102564102564105e-05, 'epoch': 1.1443149310447889, 'step': 6700} 01/18/2021 09:47:24 - INFO - transformers.trainer - {'loss': 0.13097544196993113, 'learning_rate': 3.384615384615385e-05, 'epoch': 1.1613936211092608, 'step': 6800} 01/18/2021 09:51:41 - INFO - transformers.trainer - {'loss': 0.1328784656152129, 'learning_rate': 3.358974358974359e-05, 'epoch': 1.178472311173733, 'step': 6900} 01/18/2021 09:55:58 - INFO - transformers.trainer - {'loss': 0.1310846485197544, 'learning_rate': 3.3333333333333335e-05, 'epoch': 1.1955510012382051, 'step': 7000} 01/18/2021 09:55:58 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 09:55:58 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 09:55:58 - INFO - transformers.trainer - Batch size = 8 01/18/2021 09:58:31 - INFO - transformers.trainer - {'eval_loss': 0.14204563293305938, 'epoch': 1.1955510012382051, 'step': 7000} 01/18/2021 10:02:48 - INFO - transformers.trainer - {'loss': 0.12990373389795423, 'learning_rate': 3.307692307692308e-05, 'epoch': 1.212629691302677, 'step': 7100} 01/18/2021 10:07:04 - INFO - transformers.trainer - {'loss': 0.13053299786522984, 'learning_rate': 3.282051282051282e-05, 'epoch': 1.2297083813671492, 'step': 7200} 01/18/2021 10:11:22 - INFO - transformers.trainer - {'loss': 0.12931813661009073, 'learning_rate': 3.2564102564102565e-05, 'epoch': 1.2467870714316212, 'step': 7300} 01/18/2021 10:15:38 - INFO - transformers.trainer - {'loss': 0.13015225622802973, 'learning_rate': 3.230769230769231e-05, 'epoch': 1.2638657614960933, 'step': 7400} 01/18/2021 10:19:54 - INFO - transformers.trainer - {'loss': 0.12932354724034667, 'learning_rate': 3.205128205128206e-05, 'epoch': 1.2809444515605652, 'step': 7500} 01/18/2021 10:24:11 - INFO - transformers.trainer - {'loss': 0.13253759941086174, 'learning_rate': 3.1794871794871795e-05, 'epoch': 1.2980231416250374, 'step': 7600} 01/18/2021 10:28:28 - INFO - transformers.trainer - {'loss': 0.13133565586060286, 'learning_rate': 3.153846153846154e-05, 'epoch': 1.3151018316895093, 'step': 7700} 01/18/2021 10:32:45 - INFO - transformers.trainer - {'loss': 0.1305695523135364, 'learning_rate': 3.128205128205128e-05, 'epoch': 1.3321805217539815, 'step': 7800} 01/18/2021 10:37:01 - INFO - transformers.trainer - {'loss': 0.12977138115093112, 'learning_rate': 3.102564102564103e-05, 'epoch': 1.3492592118184534, 'step': 7900} 01/18/2021 10:41:17 - INFO - transformers.trainer - {'loss': 0.13157519796863199, 'learning_rate': 3.0769230769230774e-05, 'epoch': 1.3663379018829256, 'step': 8000} 01/18/2021 10:41:17 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 10:41:17 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 10:41:17 - INFO - transformers.trainer - Batch size = 8 01/18/2021 10:43:48 - INFO - transformers.trainer - {'eval_loss': 0.14300448523892367, 'epoch': 1.3663379018829256, 'step': 8000} 01/18/2021 10:43:48 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-8000 01/18/2021 10:43:48 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-8000/config.json 01/18/2021 10:43:50 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-8000/pytorch_model.bin 01/18/2021 10:48:12 - INFO - transformers.trainer - {'loss': 0.13170190192759038, 'learning_rate': 3.0512820512820518e-05, 'epoch': 1.3834165919473977, 'step': 8100} 01/18/2021 10:52:28 - INFO - transformers.trainer - {'loss': 0.13150625461712478, 'learning_rate': 3.0256410256410257e-05, 'epoch': 1.4004952820118697, 'step': 8200} 01/18/2021 10:56:45 - INFO - transformers.trainer - {'loss': 0.1313140352256596, 'learning_rate': 3e-05, 'epoch': 1.4175739720763416, 'step': 8300} 01/18/2021 11:01:02 - INFO - transformers.trainer - {'loss': 0.13212937485426665, 'learning_rate': 2.9743589743589744e-05, 'epoch': 1.4346526621408138, 'step': 8400} 01/18/2021 11:05:19 - INFO - transformers.trainer - {'loss': 0.13036516631953418, 'learning_rate': 2.948717948717949e-05, 'epoch': 1.451731352205286, 'step': 8500} 01/18/2021 11:09:36 - INFO - transformers.trainer - {'loss': 0.13017148800194264, 'learning_rate': 2.9230769230769234e-05, 'epoch': 1.4688100422697579, 'step': 8600} 01/18/2021 11:13:52 - INFO - transformers.trainer - {'loss': 0.1297870717011392, 'learning_rate': 2.8974358974358977e-05, 'epoch': 1.48588873233423, 'step': 8700} 01/18/2021 11:18:09 - INFO - transformers.trainer - {'loss': 0.13125376602634786, 'learning_rate': 2.8717948717948717e-05, 'epoch': 1.5029674223987022, 'step': 8800} 01/18/2021 11:22:25 - INFO - transformers.trainer - {'loss': 0.12916821449995042, 'learning_rate': 2.846153846153846e-05, 'epoch': 1.5200461124631741, 'step': 8900} 01/18/2021 11:26:42 - INFO - transformers.trainer - {'loss': 0.13151773177087306, 'learning_rate': 2.8205128205128207e-05, 'epoch': 1.537124802527646, 'step': 9000} 01/18/2021 11:26:42 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 11:26:42 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 11:26:42 - INFO - transformers.trainer - Batch size = 8 01/18/2021 11:29:14 - INFO - transformers.trainer - {'eval_loss': 0.1421406558053578, 'epoch': 1.537124802527646, 'step': 9000} 01/18/2021 11:33:31 - INFO - transformers.trainer - {'loss': 0.1278994026966393, 'learning_rate': 2.794871794871795e-05, 'epoch': 1.5542034925921182, 'step': 9100} 01/18/2021 11:37:48 - INFO - transformers.trainer - {'loss': 0.13193808391690254, 'learning_rate': 2.7692307692307694e-05, 'epoch': 1.5712821826565904, 'step': 9200} 01/18/2021 11:42:04 - INFO - transformers.trainer - {'loss': 0.12896867038682103, 'learning_rate': 2.743589743589744e-05, 'epoch': 1.5883608727210623, 'step': 9300} 01/18/2021 11:46:21 - INFO - transformers.trainer - {'loss': 0.13156215767376125, 'learning_rate': 2.717948717948718e-05, 'epoch': 1.6054395627855342, 'step': 9400} 01/18/2021 11:50:38 - INFO - transformers.trainer - {'loss': 0.13141823271289468, 'learning_rate': 2.6923076923076923e-05, 'epoch': 1.6225182528500064, 'step': 9500} 01/18/2021 11:54:55 - INFO - transformers.trainer - {'loss': 0.12909374114125968, 'learning_rate': 2.6666666666666667e-05, 'epoch': 1.6395969429144786, 'step': 9600} 01/18/2021 11:59:12 - INFO - transformers.trainer - {'loss': 0.13049961291253567, 'learning_rate': 2.6410256410256413e-05, 'epoch': 1.6566756329789505, 'step': 9700} 01/18/2021 12:03:29 - INFO - transformers.trainer - {'loss': 0.12988606702536346, 'learning_rate': 2.6153846153846157e-05, 'epoch': 1.6737543230434224, 'step': 9800} 01/18/2021 12:07:47 - INFO - transformers.trainer - {'loss': 0.12760003615170717, 'learning_rate': 2.58974358974359e-05, 'epoch': 1.6908330131078946, 'step': 9900} 01/18/2021 12:12:03 - INFO - transformers.trainer - {'loss': 0.13054829772561788, 'learning_rate': 2.564102564102564e-05, 'epoch': 1.7079117031723667, 'step': 10000} 01/18/2021 12:12:03 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 12:12:03 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 12:12:03 - INFO - transformers.trainer - Batch size = 8 01/18/2021 12:14:34 - INFO - transformers.trainer - {'eval_loss': 0.14331958685010254, 'epoch': 1.7079117031723667, 'step': 10000} 01/18/2021 12:14:34 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-10000 01/18/2021 12:14:34 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-10000/config.json 01/18/2021 12:14:36 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-10000/pytorch_model.bin 01/18/2021 12:18:57 - INFO - transformers.trainer - {'loss': 0.13143323488533498, 'learning_rate': 2.5384615384615383e-05, 'epoch': 1.7249903932368387, 'step': 10100} 01/18/2021 12:23:13 - INFO - transformers.trainer - {'loss': 0.13037383226677776, 'learning_rate': 2.512820512820513e-05, 'epoch': 1.7420690833013108, 'step': 10200} 01/18/2021 12:27:30 - INFO - transformers.trainer - {'loss': 0.12980189122259617, 'learning_rate': 2.4871794871794873e-05, 'epoch': 1.759147773365783, 'step': 10300} 01/18/2021 12:31:47 - INFO - transformers.trainer - {'loss': 0.12933293216861785, 'learning_rate': 2.461538461538462e-05, 'epoch': 1.776226463430255, 'step': 10400} 01/18/2021 12:36:04 - INFO - transformers.trainer - {'loss': 0.13039672512561082, 'learning_rate': 2.435897435897436e-05, 'epoch': 1.7933051534947269, 'step': 10500} 01/18/2021 12:40:21 - INFO - transformers.trainer - {'loss': 0.13206965574994683, 'learning_rate': 2.4102564102564103e-05, 'epoch': 1.810383843559199, 'step': 10600} 01/18/2021 12:44:38 - INFO - transformers.trainer - {'loss': 0.13158165331929922, 'learning_rate': 2.384615384615385e-05, 'epoch': 1.8274625336236712, 'step': 10700} 01/18/2021 12:48:55 - INFO - transformers.trainer - {'loss': 0.13026155775412918, 'learning_rate': 2.358974358974359e-05, 'epoch': 1.844541223688143, 'step': 10800} 01/18/2021 12:53:12 - INFO - transformers.trainer - {'loss': 0.12960972161963583, 'learning_rate': 2.3333333333333336e-05, 'epoch': 1.861619913752615, 'step': 10900} 01/18/2021 12:57:28 - INFO - transformers.trainer - {'loss': 0.12870387138798833, 'learning_rate': 2.307692307692308e-05, 'epoch': 1.8786986038170872, 'step': 11000} 01/18/2021 12:57:28 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 12:57:28 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 12:57:28 - INFO - transformers.trainer - Batch size = 8 01/18/2021 12:59:58 - INFO - transformers.trainer - {'eval_loss': 0.14293941577334282, 'epoch': 1.8786986038170872, 'step': 11000} 01/18/2021 13:04:15 - INFO - transformers.trainer - {'loss': 0.13111817896366118, 'learning_rate': 2.2820512820512822e-05, 'epoch': 1.8957772938815594, 'step': 11100} 01/18/2021 13:08:32 - INFO - transformers.trainer - {'loss': 0.13073403120040894, 'learning_rate': 2.2564102564102566e-05, 'epoch': 1.9128559839460313, 'step': 11200} 01/18/2021 13:12:49 - INFO - transformers.trainer - {'loss': 0.13072644058614968, 'learning_rate': 2.230769230769231e-05, 'epoch': 1.9299346740105034, 'step': 11300} 01/18/2021 13:17:06 - INFO - transformers.trainer - {'loss': 0.13204503752291202, 'learning_rate': 2.2051282051282052e-05, 'epoch': 1.9470133640749756, 'step': 11400} 01/18/2021 13:21:22 - INFO - transformers.trainer - {'loss': 0.1309584235586226, 'learning_rate': 2.1794871794871795e-05, 'epoch': 1.9640920541394475, 'step': 11500} 01/18/2021 13:25:38 - INFO - transformers.trainer - {'loss': 0.13015030289068819, 'learning_rate': 2.1538461538461542e-05, 'epoch': 1.9811707442039195, 'step': 11600} 01/18/2021 13:29:55 - INFO - transformers.trainer - {'loss': 0.13049502471461893, 'learning_rate': 2.1282051282051282e-05, 'epoch': 1.9982494342683916, 'step': 11700} 01/18/2021 13:34:12 - INFO - transformers.trainer - {'loss': 0.13331640973687173, 'learning_rate': 2.102564102564103e-05, 'epoch': 2.015370821058025, 'step': 11800} 01/18/2021 13:38:28 - INFO - transformers.trainer - {'loss': 0.13101069876924157, 'learning_rate': 2.0769230769230772e-05, 'epoch': 2.032449511122497, 'step': 11900} 01/18/2021 13:42:44 - INFO - transformers.trainer - {'loss': 0.12941758967004716, 'learning_rate': 2.0512820512820512e-05, 'epoch': 2.049528201186969, 'step': 12000} 01/18/2021 13:42:44 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 13:42:44 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 13:42:44 - INFO - transformers.trainer - Batch size = 8 01/18/2021 13:45:15 - INFO - transformers.trainer - {'eval_loss': 0.1424399743818955, 'epoch': 2.049528201186969, 'step': 12000} 01/18/2021 13:45:15 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-12000 01/18/2021 13:45:15 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-12000/config.json 01/18/2021 13:45:17 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-12000/pytorch_model.bin 01/18/2021 13:49:38 - INFO - transformers.trainer - {'loss': 0.13227703455835582, 'learning_rate': 2.025641025641026e-05, 'epoch': 2.066606891251441, 'step': 12100} 01/18/2021 13:53:55 - INFO - transformers.trainer - {'loss': 0.13145781567320228, 'learning_rate': 2e-05, 'epoch': 2.083685581315913, 'step': 12200} 01/18/2021 13:58:11 - INFO - transformers.trainer - {'loss': 0.12907661270350218, 'learning_rate': 1.9743589743589745e-05, 'epoch': 2.100764271380385, 'step': 12300} 01/18/2021 14:02:28 - INFO - transformers.trainer - {'loss': 0.13096423806622626, 'learning_rate': 1.9487179487179488e-05, 'epoch': 2.117842961444857, 'step': 12400} 01/18/2021 14:06:43 - INFO - transformers.trainer - {'loss': 0.12823980685323477, 'learning_rate': 1.923076923076923e-05, 'epoch': 2.1349216515093294, 'step': 12500} 01/18/2021 14:11:01 - INFO - transformers.trainer - {'loss': 0.13245878113433718, 'learning_rate': 1.8974358974358975e-05, 'epoch': 2.1520003415738014, 'step': 12600} 01/18/2021 14:15:18 - INFO - transformers.trainer - {'loss': 0.1294815630465746, 'learning_rate': 1.8717948717948718e-05, 'epoch': 2.1690790316382733, 'step': 12700} 01/18/2021 14:19:35 - INFO - transformers.trainer - {'loss': 0.13088739704340696, 'learning_rate': 1.8461538461538465e-05, 'epoch': 2.1861577217027452, 'step': 12800} 01/18/2021 14:23:52 - INFO - transformers.trainer - {'loss': 0.13000295644626023, 'learning_rate': 1.8205128205128204e-05, 'epoch': 2.2032364117672176, 'step': 12900} 01/18/2021 14:28:08 - INFO - transformers.trainer - {'loss': 0.13137389086186885, 'learning_rate': 1.794871794871795e-05, 'epoch': 2.2203151018316896, 'step': 13000} 01/18/2021 14:28:08 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 14:28:08 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 14:28:08 - INFO - transformers.trainer - Batch size = 8 01/18/2021 14:30:39 - INFO - transformers.trainer - {'eval_loss': 0.14216205362474624, 'epoch': 2.2203151018316896, 'step': 13000} 01/18/2021 14:34:56 - INFO - transformers.trainer - {'loss': 0.12851803576573728, 'learning_rate': 1.7692307692307694e-05, 'epoch': 2.2373937918961615, 'step': 13100} 01/18/2021 14:39:14 - INFO - transformers.trainer - {'loss': 0.1288341834396124, 'learning_rate': 1.7435897435897434e-05, 'epoch': 2.2544724819606334, 'step': 13200} 01/18/2021 14:43:30 - INFO - transformers.trainer - {'loss': 0.131309083327651, 'learning_rate': 1.717948717948718e-05, 'epoch': 2.271551172025106, 'step': 13300} 01/18/2021 14:47:47 - INFO - transformers.trainer - {'loss': 0.13138174144551157, 'learning_rate': 1.6923076923076924e-05, 'epoch': 2.2886298620895777, 'step': 13400} 01/18/2021 14:52:04 - INFO - transformers.trainer - {'loss': 0.13268130009993911, 'learning_rate': 1.6666666666666667e-05, 'epoch': 2.3057085521540497, 'step': 13500} 01/18/2021 14:56:20 - INFO - transformers.trainer - {'loss': 0.13109591335058213, 'learning_rate': 1.641025641025641e-05, 'epoch': 2.3227872422185216, 'step': 13600} 01/18/2021 15:00:37 - INFO - transformers.trainer - {'loss': 0.1311177603714168, 'learning_rate': 1.6153846153846154e-05, 'epoch': 2.339865932282994, 'step': 13700} 01/18/2021 15:04:54 - INFO - transformers.trainer - {'loss': 0.13198109382763504, 'learning_rate': 1.5897435897435897e-05, 'epoch': 2.356944622347466, 'step': 13800} 01/18/2021 15:09:11 - INFO - transformers.trainer - {'loss': 0.13218228375539184, 'learning_rate': 1.564102564102564e-05, 'epoch': 2.374023312411938, 'step': 13900} 01/18/2021 15:13:28 - INFO - transformers.trainer - {'loss': 0.13290270380675792, 'learning_rate': 1.5384615384615387e-05, 'epoch': 2.3911020024764102, 'step': 14000} 01/18/2021 15:13:28 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 15:13:28 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 15:13:28 - INFO - transformers.trainer - Batch size = 8 01/18/2021 15:16:00 - INFO - transformers.trainer - {'eval_loss': 0.1424810452352725, 'epoch': 2.3911020024764102, 'step': 14000} 01/18/2021 15:16:00 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-14000 01/18/2021 15:16:00 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-14000/config.json 01/18/2021 15:16:02 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-14000/pytorch_model.bin 01/18/2021 15:20:23 - INFO - transformers.trainer - {'loss': 0.13055346734821796, 'learning_rate': 1.5128205128205129e-05, 'epoch': 2.408180692540882, 'step': 14100} 01/18/2021 15:24:40 - INFO - transformers.trainer - {'loss': 0.13003281952813267, 'learning_rate': 1.4871794871794872e-05, 'epoch': 2.425259382605354, 'step': 14200} 01/18/2021 15:28:57 - INFO - transformers.trainer - {'loss': 0.13160310355946422, 'learning_rate': 1.4615384615384617e-05, 'epoch': 2.442338072669826, 'step': 14300} 01/18/2021 15:33:14 - INFO - transformers.trainer - {'loss': 0.1296693200804293, 'learning_rate': 1.4358974358974359e-05, 'epoch': 2.4594167627342984, 'step': 14400} 01/18/2021 15:37:31 - INFO - transformers.trainer - {'loss': 0.12928257374092936, 'learning_rate': 1.4102564102564104e-05, 'epoch': 2.4764954527987704, 'step': 14500} 01/18/2021 15:41:48 - INFO - transformers.trainer - {'loss': 0.13159808224067093, 'learning_rate': 1.3846153846153847e-05, 'epoch': 2.4935741428632423, 'step': 14600} 01/18/2021 15:46:04 - INFO - transformers.trainer - {'loss': 0.12895757029764354, 'learning_rate': 1.358974358974359e-05, 'epoch': 2.5106528329277147, 'step': 14700} 01/18/2021 15:50:21 - INFO - transformers.trainer - {'loss': 0.12953323792666196, 'learning_rate': 1.3333333333333333e-05, 'epoch': 2.5277315229921866, 'step': 14800} 01/18/2021 15:54:38 - INFO - transformers.trainer - {'loss': 0.12858898907899857, 'learning_rate': 1.3076923076923078e-05, 'epoch': 2.5448102130566586, 'step': 14900} 01/18/2021 15:58:55 - INFO - transformers.trainer - {'loss': 0.1293325940705836, 'learning_rate': 1.282051282051282e-05, 'epoch': 2.5618889031211305, 'step': 15000} 01/18/2021 15:58:55 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 15:58:55 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 15:58:55 - INFO - transformers.trainer - Batch size = 8 01/18/2021 16:01:28 - INFO - transformers.trainer - {'eval_loss': 0.14231833793567022, 'epoch': 2.5618889031211305, 'step': 15000} 01/18/2021 16:05:44 - INFO - transformers.trainer - {'loss': 0.12965147987008094, 'learning_rate': 1.2564102564102565e-05, 'epoch': 2.5789675931856024, 'step': 15100} 01/18/2021 16:10:01 - INFO - transformers.trainer - {'loss': 0.13080044370144606, 'learning_rate': 1.230769230769231e-05, 'epoch': 2.596046283250075, 'step': 15200} 01/18/2021 16:14:18 - INFO - transformers.trainer - {'loss': 0.13337840327061712, 'learning_rate': 1.2051282051282051e-05, 'epoch': 2.6131249733145467, 'step': 15300} 01/18/2021 16:18:35 - INFO - transformers.trainer - {'loss': 0.13098039561882616, 'learning_rate': 1.1794871794871795e-05, 'epoch': 2.6302036633790187, 'step': 15400} 01/18/2021 16:22:52 - INFO - transformers.trainer - {'loss': 0.12887955086305738, 'learning_rate': 1.153846153846154e-05, 'epoch': 2.647282353443491, 'step': 15500} 01/18/2021 16:27:09 - INFO - transformers.trainer - {'loss': 0.12991797411814332, 'learning_rate': 1.1282051282051283e-05, 'epoch': 2.664361043507963, 'step': 15600} 01/18/2021 16:31:25 - INFO - transformers.trainer - {'loss': 0.13052929306402802, 'learning_rate': 1.1025641025641026e-05, 'epoch': 2.681439733572435, 'step': 15700} 01/18/2021 16:35:42 - INFO - transformers.trainer - {'loss': 0.13088280947878958, 'learning_rate': 1.0769230769230771e-05, 'epoch': 2.698518423636907, 'step': 15800} 01/18/2021 16:39:59 - INFO - transformers.trainer - {'loss': 0.1314548215828836, 'learning_rate': 1.0512820512820514e-05, 'epoch': 2.7155971137013792, 'step': 15900} 01/18/2021 16:44:15 - INFO - transformers.trainer - {'loss': 0.13059457195922733, 'learning_rate': 1.0256410256410256e-05, 'epoch': 2.732675803765851, 'step': 16000} 01/18/2021 16:44:15 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 16:44:15 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 16:44:15 - INFO - transformers.trainer - Batch size = 8 01/18/2021 16:46:49 - INFO - transformers.trainer - {'eval_loss': 0.1423268156240181, 'epoch': 2.732675803765851, 'step': 16000} 01/18/2021 16:46:49 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-16000 01/18/2021 16:46:49 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-16000/config.json 01/18/2021 16:46:51 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-16000/pytorch_model.bin 01/18/2021 16:51:12 - INFO - transformers.trainer - {'loss': 0.13076298715546728, 'learning_rate': 1e-05, 'epoch': 2.749754493830323, 'step': 16100} 01/18/2021 16:55:28 - INFO - transformers.trainer - {'loss': 0.1295975492708385, 'learning_rate': 9.743589743589744e-06, 'epoch': 2.7668331838947955, 'step': 16200} 01/18/2021 16:59:44 - INFO - transformers.trainer - {'loss': 0.13097517512738704, 'learning_rate': 9.487179487179487e-06, 'epoch': 2.7839118739592674, 'step': 16300} 01/18/2021 17:04:01 - INFO - transformers.trainer - {'loss': 0.1308065737783909, 'learning_rate': 9.230769230769232e-06, 'epoch': 2.8009905640237394, 'step': 16400} 01/18/2021 17:08:18 - INFO - transformers.trainer - {'loss': 0.13254113286733626, 'learning_rate': 8.974358974358976e-06, 'epoch': 2.8180692540882113, 'step': 16500} 01/18/2021 17:12:35 - INFO - transformers.trainer - {'loss': 0.13274792805314065, 'learning_rate': 8.717948717948717e-06, 'epoch': 2.8351479441526832, 'step': 16600} 01/18/2021 17:16:52 - INFO - transformers.trainer - {'loss': 0.13092530166730285, 'learning_rate': 8.461538461538462e-06, 'epoch': 2.8522266342171556, 'step': 16700} 01/18/2021 17:21:09 - INFO - transformers.trainer - {'loss': 0.13017749618738889, 'learning_rate': 8.205128205128205e-06, 'epoch': 2.8693053242816275, 'step': 16800} 01/18/2021 17:25:25 - INFO - transformers.trainer - {'loss': 0.1296338775474578, 'learning_rate': 7.948717948717949e-06, 'epoch': 2.8863840143461, 'step': 16900} 01/18/2021 17:29:42 - INFO - transformers.trainer - {'loss': 0.12802277641370893, 'learning_rate': 7.692307692307694e-06, 'epoch': 2.903462704410572, 'step': 17000} 01/18/2021 17:29:42 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 17:29:42 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 17:29:42 - INFO - transformers.trainer - Batch size = 8 01/18/2021 17:32:14 - INFO - transformers.trainer - {'eval_loss': 0.14222049867188125, 'epoch': 2.903462704410572, 'step': 17000} 01/18/2021 17:36:31 - INFO - transformers.trainer - {'loss': 0.13057434551417826, 'learning_rate': 7.435897435897436e-06, 'epoch': 2.920541394475044, 'step': 17100} 01/18/2021 17:40:48 - INFO - transformers.trainer - {'loss': 0.13143418861553072, 'learning_rate': 7.179487179487179e-06, 'epoch': 2.9376200845395157, 'step': 17200} 01/18/2021 17:45:06 - INFO - transformers.trainer - {'loss': 0.12852452585473656, 'learning_rate': 6.923076923076923e-06, 'epoch': 2.9546987746039877, 'step': 17300} 01/18/2021 17:49:22 - INFO - transformers.trainer - {'loss': 0.12928976690396665, 'learning_rate': 6.666666666666667e-06, 'epoch': 2.97177746466846, 'step': 17400} 01/18/2021 17:53:39 - INFO - transformers.trainer - {'loss': 0.13133566157892346, 'learning_rate': 6.41025641025641e-06, 'epoch': 2.988856154732932, 'step': 17500} 01/18/2021 17:57:56 - INFO - transformers.trainer - {'loss': 0.1324333022162318, 'learning_rate': 6.153846153846155e-06, 'epoch': 3.005977541522565, 'step': 17600} 01/18/2021 18:02:13 - INFO - transformers.trainer - {'loss': 0.1312393301166594, 'learning_rate': 5.897435897435897e-06, 'epoch': 3.023056231587037, 'step': 17700} 01/18/2021 18:06:30 - INFO - transformers.trainer - {'loss': 0.13174279500730335, 'learning_rate': 5.641025641025641e-06, 'epoch': 3.0401349216515094, 'step': 17800} 01/18/2021 18:10:47 - INFO - transformers.trainer - {'loss': 0.13103315832093357, 'learning_rate': 5.3846153846153855e-06, 'epoch': 3.0572136117159814, 'step': 17900} 01/18/2021 18:15:03 - INFO - transformers.trainer - {'loss': 0.13001919738948345, 'learning_rate': 5.128205128205128e-06, 'epoch': 3.0742923017804533, 'step': 18000} 01/18/2021 18:15:03 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 18:15:03 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 18:15:03 - INFO - transformers.trainer - Batch size = 8 01/18/2021 18:17:36 - INFO - transformers.trainer - {'eval_loss': 0.1425281741229215, 'epoch': 3.0742923017804533, 'step': 18000} 01/18/2021 18:17:36 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-18000 01/18/2021 18:17:36 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-18000/config.json 01/18/2021 18:17:39 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-18000/pytorch_model.bin 01/18/2021 18:21:59 - INFO - transformers.trainer - {'loss': 0.13199391799047588, 'learning_rate': 4.871794871794872e-06, 'epoch': 3.0913709918449257, 'step': 18100} 01/18/2021 18:26:16 - INFO - transformers.trainer - {'loss': 0.13001088454388082, 'learning_rate': 4.615384615384616e-06, 'epoch': 3.1084496819093976, 'step': 18200} 01/18/2021 18:30:34 - INFO - transformers.trainer - {'loss': 0.1296701157465577, 'learning_rate': 4.3589743589743586e-06, 'epoch': 3.1255283719738696, 'step': 18300} 01/18/2021 18:34:51 - INFO - transformers.trainer - {'loss': 0.13028697127476335, 'learning_rate': 4.102564102564103e-06, 'epoch': 3.1426070620383415, 'step': 18400} 01/18/2021 18:39:08 - INFO - transformers.trainer - {'loss': 0.12969573942944407, 'learning_rate': 3.846153846153847e-06, 'epoch': 3.159685752102814, 'step': 18500} 01/18/2021 18:43:25 - INFO - transformers.trainer - {'loss': 0.13035354193300008, 'learning_rate': 3.5897435897435896e-06, 'epoch': 3.176764442167286, 'step': 18600} 01/18/2021 18:47:43 - INFO - transformers.trainer - {'loss': 0.12901145804673433, 'learning_rate': 3.3333333333333333e-06, 'epoch': 3.1938431322317578, 'step': 18700} 01/18/2021 18:52:00 - INFO - transformers.trainer - {'loss': 0.13381920736283065, 'learning_rate': 3.0769230769230774e-06, 'epoch': 3.2109218222962297, 'step': 18800} 01/18/2021 18:56:17 - INFO - transformers.trainer - {'loss': 0.12999009462073446, 'learning_rate': 2.8205128205128207e-06, 'epoch': 3.228000512360702, 'step': 18900} 01/18/2021 19:00:34 - INFO - transformers.trainer - {'loss': 0.13016757289879025, 'learning_rate': 2.564102564102564e-06, 'epoch': 3.245079202425174, 'step': 19000} 01/18/2021 19:00:34 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 19:00:34 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 19:00:34 - INFO - transformers.trainer - Batch size = 8 01/18/2021 19:03:06 - INFO - transformers.trainer - {'eval_loss': 0.14237473097606837, 'epoch': 3.245079202425174, 'step': 19000} 01/18/2021 19:07:23 - INFO - transformers.trainer - {'loss': 0.13157522074878217, 'learning_rate': 2.307692307692308e-06, 'epoch': 3.262157892489646, 'step': 19100} 01/18/2021 19:11:40 - INFO - transformers.trainer - {'loss': 0.13183796728029848, 'learning_rate': 2.0512820512820513e-06, 'epoch': 3.279236582554118, 'step': 19200} 01/18/2021 19:15:56 - INFO - transformers.trainer - {'loss': 0.1313746290653944, 'learning_rate': 1.7948717948717948e-06, 'epoch': 3.2963152726185903, 'step': 19300} 01/18/2021 19:20:13 - INFO - transformers.trainer - {'loss': 0.1296540303900838, 'learning_rate': 1.5384615384615387e-06, 'epoch': 3.313393962683062, 'step': 19400} 01/18/2021 19:24:30 - INFO - transformers.trainer - {'loss': 0.1289286340586841, 'learning_rate': 1.282051282051282e-06, 'epoch': 3.330472652747534, 'step': 19500} 01/18/2021 19:28:46 - INFO - transformers.trainer - {'loss': 0.13165263276547193, 'learning_rate': 1.0256410256410257e-06, 'epoch': 3.3475513428120065, 'step': 19600} 01/18/2021 19:33:03 - INFO - transformers.trainer - {'loss': 0.13271798353642225, 'learning_rate': 7.692307692307694e-07, 'epoch': 3.3646300328764784, 'step': 19700} 01/18/2021 19:37:20 - INFO - transformers.trainer - {'loss': 0.13064167259261011, 'learning_rate': 5.128205128205128e-07, 'epoch': 3.3817087229409504, 'step': 19800} 01/18/2021 19:41:36 - INFO - transformers.trainer - {'loss': 0.1307045934908092, 'learning_rate': 2.564102564102564e-07, 'epoch': 3.3987874130054223, 'step': 19900} 01/18/2021 19:45:53 - INFO - transformers.trainer - {'loss': 0.1308637114241719, 'learning_rate': 0.0, 'epoch': 3.4158661030698947, 'step': 20000} 01/18/2021 19:45:53 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 19:45:53 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 19:45:53 - INFO - transformers.trainer - Batch size = 8 01/18/2021 19:48:25 - INFO - transformers.trainer - {'eval_loss': 0.14246687874454048, 'epoch': 3.4158661030698947, 'step': 20000} 01/18/2021 19:48:25 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-20000 01/18/2021 19:48:25 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-20000/config.json 01/18/2021 19:48:27 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/checkpoint-20000/pytorch_model.bin 01/18/2021 19:48:34 - INFO - transformers.trainer - Training completed. Do not forget to share your model on huggingface.co/models =) 01/18/2021 19:48:34 - INFO - transformers.trainer - Saving model checkpoint to checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2 01/18/2021 19:48:34 - INFO - transformers.configuration_utils - Configuration saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/config.json 01/18/2021 19:48:36 - INFO - transformers.modeling_utils - Model weights saved in checkpoint/seqlabel/cnndm/20210118/cnndm.bert-large-cased.bsz7.uf4.gpu0_1_2/pytorch_model.bin 01/18/2021 19:48:36 - INFO - __main__ - *** Evaluate *** 01/18/2021 19:48:36 - INFO - transformers.trainer - ***** Running Evaluation ***** 01/18/2021 19:48:36 - INFO - transformers.trainer - Num examples = 22339 01/18/2021 19:48:36 - INFO - transformers.trainer - Batch size = 8 01/18/2021 19:51:08 - INFO - transformers.trainer - {'eval_loss': 0.14246687874454048, 'epoch': 3.416036889970539, 'step': 20001} 01/18/2021 19:51:08 - INFO - __main__ - ***** Eval results ***** 01/18/2021 19:51:08 - INFO - __main__ - eval_loss = 0.14246687874454048 01/18/2021 19:51:08 - INFO - __main__ - epoch = 3.416036889970539 01/18/2021 19:51:10 - INFO - filelock - Lock 140247007239824 acquired on /home/volker/.cache/huggingface/datasets/9756f8f80e16ee5ac069b79fe89ac2b3f5bcfb45f4b062aa7893dd2938c9c00f.de777b72eef8feede3bda87890ea212f8e91a531814ac5161b97706188ba7174.py.lock 01/18/2021 19:51:10 - INFO - filelock - Lock 140247007239824 released on /home/volker/.cache/huggingface/datasets/9756f8f80e16ee5ac069b79fe89ac2b3f5bcfb45f4b062aa7893dd2938c9c00f.de777b72eef8feede3bda87890ea212f8e91a531814ac5161b97706188ba7174.py.lock 01/18/2021 19:51:10 - INFO - filelock - Lock 140247007236688 acquired on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-fa3d19548a18769c_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 19:51:10 - INFO - filelock - Lock 140247007236688 released on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-fa3d19548a18769c_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 19:51:10 - INFO - filelock - Lock 140245873046160 acquired on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-fa3d19548a18769c_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 19:51:10 - INFO - filelock - Lock 140245873046160 released on datasets/cnndm/hf_cache/datasets_cnndm_hf_cache_json_default-fa3d19548a18769c_0.0.0_fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.lock 01/18/2021 19:51:12 - INFO - transformers.trainer - ***** Running Prediction ***** 01/18/2021 19:51:12 - INFO - transformers.trainer - Num examples = 26648 01/18/2021 19:51:12 - INFO - transformers.trainer - Batch size = 8 01/18/2021 19:58:14 - INFO - __main__ - eval_loss = 0.1152171123300073 ```

Loss BERT Tagger

image

The loss makes a sudden jump around step 3700 and then the models seems to be stuck in some local minima. The probability distribution over tokens is basically uniform in the end, which probably explains the output resembling a sentence.

Loss RoBERTa Tagger

image

Training for the RoBERTa tagger looks fine, but you are right that I would need to tune the hyperparams myself in this case.

I will try to retrain the BERT Tagger and contact you directly for the tagger weights. Thanks so much!

volker42maru commented 3 years ago

I just noticed that there is random sampling step (without a defined seed) for keyword dropout 🙈: https://github.com/salesforce/ctrl-sum/blob/master/scripts/preprocess.py#L501

That explains why my text.oracleword looks a bit different from the example given in this repo. I guess this might influence the training + final performance of the tagger to some extent. (For my BERT tagger is was probably just an unfortunate training run).

jxhe commented 3 years ago

Thank you for sharing this! A few things I noticed:

  1. Our loss and eval_loss figures look more like your RoBERTa figures
  2. From your BERT training log I noticed that the number of our evaluation examples are different, ours has ~31k examples but your log only has 22k, did you evaluate on the validation set or test ? (should be validation by default). I'll double check our preprocessing script and get back to you
  3. Your batch size seems to be 84 while we used 128 following the script in the repo
  4. Huggingface saves every checkpoint per 1000 steps by default, and the checkpoint in their root checkpoint directory is the last one, it sounds like you were using this last checkpoint, but we used the checkpoint with the best validation loss since the eval loss goes up in the end -- you can navigate into the checkpoint directory at specific steps to find it
  5. The keyword dropout only influences BART training -- the tagger is trained without keyword dropout thus the random seed you referred should not influence the tagger
jxhe commented 3 years ago

I have an update on the mismatch of number of validation examples. By running the script today I got 22k validation examples, it turns out that I used valeval.seqlabel.jsonl for validation during tagger training which contains 31k examples. The difference between these two is whether it contains segment spans that contain no keyword. It is hard to say which validation choice is actually better, but all spans should be included at prediction time

volker42maru commented 3 years ago

Thanks for the detailed response and for catching up on this!

Your batch size seems to be 84 while we used 128 following the script in the repo

Yes, I didn't have enough GPUs and had to reduce batch size to fit into memory. I forgot to adjust the update_freq accordingly.

The keyword dropout only influences BART training -- the tagger is trained without keyword dropout thus the random seed you referred should not influence the tagger

Right, that was my mistake.

To (4): I will test with the checkpoint with the best validation loss and check again if the results get better. I still have to optimize the hyperparams for my tagger as well (my score is around 45 R1 now).

jxhe commented 3 years ago

I have shared you our pretrained BERT tagger weights over email, and you can refer to scripts/test_bart.sh as in readme for how we computed ROUGE scores. By using the pretrained tagger I hope you can reproduce the results easily, I will close this issue for now, but feel free to reopen it if you still have trouble on this.

volker42maru commented 3 years ago

I could reproduce the results with your BERT tagger weights.

Thanks for the help 😀