Open shankyemcee opened 3 years ago
Same issue here, did this get resolved?
Update: works if you use "CORRECT" or "INCORRECT" as label. Kind of unclear from the documentation that specifies that it should have keyword ID.
Hi,
Yes my issue got resolved. The given steps to evaluate is same as mentioned in the README, but an extra label field should be added either as 'CONSISTENT' or 'INCONSISTENT'...then from the code you can extract all the predictions that are labelled as 0...From the code it looked like the 0 label predictions mean CONSISTENT
@shankyemcee thank you that helps
Hi,
I tried running this metric again. Seems like last time it was using the cached features. It actually works only if CORRECT or INCORRECT is given as labels. But I gave the label as CORRECT for all the features and it gives me a factcc score of 1. Is this the intended behavior ? and how can i check factuality of generated summaries with this?
Here is the snippet of the error:
04/03/2021 14:39:09 - WARNING - main - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False 04/03/2021 14:39:10 - INFO - pytorch_transformers.modeling_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json from cache at /home/shankar5/.cache/torch/pytorch_transformers/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517 04/03/2021 14:39:10 - INFO - pytorch_transformers.modeling_utils - Model config { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "finetuning_task": "factcc_annotated", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "num_labels": 2, "output_attentions": false, "output_hidden_states": false, "pad_token_id": 0, "pruned_heads": {}, "torchscript": false, "type_vocab_size": 2, "vocab_size": 30522 }
04/03/2021 14:39:10 - INFO - pytorch_transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/shankar5/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084 04/03/2021 14:39:10 - INFO - main - Loading model from checkpoint. 04/03/2021 14:39:10 - INFO - pytorch_transformers.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at /home/shankar5/.cache/torch/pytorch_transformers/aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157 04/03/2021 14:39:15 - INFO - pytorch_transformers.modeling_utils - Weights of BertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias'] 04/03/2021 14:39:15 - INFO - pytorch_transformers.modeling_utils - Weights from pretrained model not used in BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'] 04/03/2021 14:39:15 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/data', device=device(type='cpu'), do_eval=True, do_lower_case=True, do_train=False, eval_all_checkpoints=True, evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, local_rank=-1, logging_steps=100, loss_lambda=0.1, max_grad_norm=1.0, max_seq_length=512, max_steps=-1, model_name_or_path='bert-base-uncased', model_type='bert', n_gpu=0, no_cuda=False, num_train_epochs=3.0, output_dir='/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/factcc-checkpoint', output_mode='classification', overwrite_cache=True, overwrite_output_dir=False, per_gpu_eval_batch_size=8, per_gpu_train_batch_size=12, save_steps=50, seed=42, task_name='factcc_annotated', tokenizer_name='', train_from_scratch=False, warmup_steps=0, weight_decay=0.0) 04/03/2021 14:39:15 - INFO - main - Evaluate the following checkpoints: ['/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/factcc-checkpoint'] 04/03/2021 14:39:20 - INFO - main - Creating features from dataset file at /project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/data 04/03/2021 14:39:20 - INFO - utils - Writing example 0 of 5115 label_map label: ['CORRECT', 'INCORRECT'] Traceback (most recent call last): File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 519, in
main()
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 511, in main
result = evaluate(args, model, tokenizer, prefix=global_step)
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 215, in evaluate
eval_dataset = load_and_cache_examples(args, eval_task, tokenizer, evaluate=True)
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/run.py", line 300, in load_and_cache_examples
pad_token_segment_id=4 if args.model_type in ['xlnet'] else 0)
File "/project/6027213/shankar5/envs/chart2text/Chart2Text_extended/factcc/modeling/utils.py", line 307, in convert_examples_to_features
label_id = label_map[example.label]
KeyError: 0
I formatted the jsonl file according to the instructions, but changed the id to label since it threw errors:
{ "label": 0, "text": "This statistic presents the most common concerns about online versus on-campus learning options according to online students in the United States in 2019 . During the survey period , 21 percent of respondents expressed some concern about the perception of their online degree by prospective employers .\n", "claim": "This statistic shows the results of a survey conducted in the country in 2019 on the importance of online in a Perception_of_online_learning_degree_by_prospective_employers . Some 31 % of respondents stated that online was Quality_and_instruction_and_academic_support to them in a Perception_of_online_learning_degree_by_prospective_employers because they Quality_and_instruction_and_academic_support feel it .\n" }, { "label": 1, "text": "The statistic shows the divorce rate in Norway from 2009 to 2019 , by gender . The divorce rate overall declined within this decade . In 2019 , there were ten divorces per thousand married and separated males , and 10.3 divorces per thousand married and separated females .\n", "claim": "This statistic shows the rate Norway of from 2009 to 2019 by gender . In 2019 , Norway 's Males Norway amounted to approximately 10.0 million , while the Females Norway amounted to approximately 10.3 million inhabitants .\n" }, { "label": 2, "text": "This statistic shows the Milliman Medical Index ( MMI ) or the annual medical cost for a family of four in the U.S. from 2013 to 2020 . In 2013 , the projected annual medical cost for a family of four was 22,030 U.S. dollars whereas this cost increased to 28,653 U.S. dollars in 2020 .\n", "claim": "This statistic presents the Annual medical of cost and for to family in U.S. U.S. U.S. 2013 to 2019 , with a forecast for 2020 . Over this period , the medical of the cost and for industry to family in U.S. U.S. increased , reaching around 28166 million U.S. in 2018 .\n" },