Hello, may I ask why dev can't be used to verify the validation set using the model you gave me? Keyerror will be reported. If I ignore keyerror, AttributeError will also be generated

Just to elaborate on what I perceive to be @zyt888's issue. When I run the following with export SPLIT_FOR_EVAL=test:

python run_coref.py \
        --output_dir=$OUTPUT_DIR \
        --cache_dir=$CACHE_DIR \
        --model_type=longformer \
        --model_name_or_path=$MODEL_DIR \
        --tokenizer_name=allenai/longformer-large-4096 \
        --config_name=allenai/longformer-large-4096  \
        --train_file=$DATA_DIR/train.english.jsonlines \
        --predict_file=$DATA_DIR/test.english.jsonlines \
        --do_eval \
        --num_train_epochs=129 \
        --logging_steps=500 \
        --save_steps=3000 \
        --eval_steps=1000 \
        --max_seq_length=4096 \
        --train_file_cache=$DATA_DIR/train.english.4096.pkl \
        --predict_file_cache=$DATA_DIR/test.english.4096.pkl \
        --amp \
        --normalise_loss \
        --max_total_seq_len=5000 \
        --experiment_name=eval_model \
        --warmup_steps=5600 \
        --adam_epsilon=1e-6 \
        --head_learning_rate=3e-4 \
        --learning_rate=1e-5 \
        --adam_beta2=0.98 \
        --weight_decay=0.01 \
        --dropout_prob=0.3 \
        --save_if_best \
        --top_lambda=0.4  \
        --tensorboard_dir=$OUTPUT_DIR/tb \
        --conll_path_for_eval=$DATA_DIR/$SPLIT_FOR_EVAL.english.v4_gold_conll

I get the following output:

08/30/2022 21:51:23 - INFO - __main__ -   model_type - longformer
08/30/2022 21:51:23 - INFO - __main__ -   model_name_or_path - /home/tanner/git/s2e-coref/model
08/30/2022 21:51:23 - INFO - __main__ -   output_dir - output
08/30/2022 21:51:23 - INFO - __main__ -   train_file_cache - /home/tanner/conll/conll-2012/english/train.english.4096.pkl
08/30/2022 21:51:23 - INFO - __main__ -   predict_file_cache - /home/tanner/conll/conll-2012/english/test.english.4096.pkl
08/30/2022 21:51:23 - INFO - __main__ -   train_file - /home/tanner/conll/conll-2012/english/train.english.jsonlines
08/30/2022 21:51:23 - INFO - __main__ -   predict_file - /home/tanner/conll/conll-2012/english/test.english.jsonlines
08/30/2022 21:51:23 - INFO - __main__ -   config_name - allenai/longformer-large-4096
08/30/2022 21:51:23 - INFO - __main__ -   tokenizer_name - allenai/longformer-large-4096
08/30/2022 21:51:23 - INFO - __main__ -   cache_dir - cache
08/30/2022 21:51:23 - INFO - __main__ -   max_seq_length - 4096
08/30/2022 21:51:23 - INFO - __main__ -   do_train - False
08/30/2022 21:51:23 - INFO - __main__ -   do_eval - True
08/30/2022 21:51:23 - INFO - __main__ -   do_lower_case - False
08/30/2022 21:51:23 - INFO - __main__ -   nonfreeze_params - None
08/30/2022 21:51:23 - INFO - __main__ -   learning_rate - 1e-05
08/30/2022 21:51:23 - INFO - __main__ -   head_learning_rate - 0.0003
08/30/2022 21:51:23 - INFO - __main__ -   dropout_prob - 0.3
08/30/2022 21:51:23 - INFO - __main__ -   gradient_accumulation_steps - 1
08/30/2022 21:51:23 - INFO - __main__ -   weight_decay - 0.01
08/30/2022 21:51:23 - INFO - __main__ -   adam_beta1 - 0.9
08/30/2022 21:51:23 - INFO - __main__ -   adam_beta2 - 0.98
08/30/2022 21:51:23 - INFO - __main__ -   adam_epsilon - 1e-06
08/30/2022 21:51:23 - INFO - __main__ -   num_train_epochs - 129.0
08/30/2022 21:51:23 - INFO - __main__ -   warmup_steps - 5600
08/30/2022 21:51:23 - INFO - __main__ -   logging_steps - 500
08/30/2022 21:51:23 - INFO - __main__ -   eval_steps - 1000
08/30/2022 21:51:23 - INFO - __main__ -   save_steps - 3000
08/30/2022 21:51:23 - INFO - __main__ -   no_cuda - False
08/30/2022 21:51:23 - INFO - __main__ -   overwrite_output_dir - False
08/30/2022 21:51:23 - INFO - __main__ -   seed - 42
08/30/2022 21:51:23 - INFO - __main__ -   local_rank - -1
08/30/2022 21:51:23 - INFO - __main__ -   amp - True
08/30/2022 21:51:23 - INFO - __main__ -   fp16_opt_level - O1
08/30/2022 21:51:23 - INFO - __main__ -   max_span_length - 30
08/30/2022 21:51:23 - INFO - __main__ -   top_lambda - 0.4
08/30/2022 21:51:23 - INFO - __main__ -   max_total_seq_len - 5000
08/30/2022 21:51:23 - INFO - __main__ -   experiment_name - eval_model
08/30/2022 21:51:23 - INFO - __main__ -   normalise_loss - True
08/30/2022 21:51:23 - INFO - __main__ -   ffnn_size - 3072
08/30/2022 21:51:23 - INFO - __main__ -   save_if_best - True
08/30/2022 21:51:23 - INFO - __main__ -   batch_size_1 - False
08/30/2022 21:51:23 - INFO - __main__ -   tensorboard_dir - output/tb
08/30/2022 21:51:23 - INFO - __main__ -   conll_path_for_eval - /home/tanner/conll/conll-2012/english/dev.english.v4_gold_conll
08/30/2022 21:51:23 - INFO - __main__ -   n_gpu - 0
08/30/2022 21:51:23 - INFO - __main__ -   device - cpu
Writing output/meta.json
08/30/2022 21:51:23 - INFO - __main__ -   Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, amp training: True
08/30/2022 21:51:34 - INFO - __main__ -   Training/evaluation parameters Namespace(adam_beta1=0.9, adam_beta2=0.98, adam_epsilon=1e-06, amp=True, batch_size_1=False, cache_dir='cache', config_name='allenai/longformer-large-4096', conll_path_for_eval='/home/tanner/conll/conll-2012/english/dev.english.v4_gold_conll', device=device(type='cpu'), do_eval=True, do_lower_case=False, do_train=False, dropout_prob=0.3, eval_steps=1000, experiment_name='eval_model', ffnn_size=3072, fp16_opt_level='O1', gradient_accumulation_steps=1, head_learning_rate=0.0003, learning_rate=1e-05, local_rank=-1, logging_steps=500, max_seq_length=4096, max_span_length=30, max_total_seq_len=5000, model_name_or_path='/home/tanner/git/s2e-coref/model', model_type='longformer', n_gpu=0, no_cuda=False, nonfreeze_params=None, normalise_loss=True, num_train_epochs=129.0, output_dir='output', overwrite_output_dir=False, predict_file='/home/tanner/conll/conll-2012/english/test.english.jsonlines', predict_file_cache='/home/tanner/conll/conll-2012/english/test.english.4096.pkl', save_if_best=True, save_steps=3000, seed=42, tensorboard_dir='output/tb', tokenizer_name='allenai/longformer-large-4096', top_lambda=0.4, train_file='/home/tanner/conll/conll-2012/english/train.english.jsonlines', train_file_cache='/home/tanner/conll/conll-2012/english/train.english.4096.pkl', warmup_steps=5600, weight_decay=0.01)
08/30/2022 21:51:34 - INFO - data -   Reading dataset from /home/tanner/conll/conll-2012/english/test.english.jsonlines
08/30/2022 21:51:47 - INFO - data -   Finished preprocessing Coref dataset. 348 examples were extracted, 0 were filtered due to sequence length.
/home/tanner/git/s2e-coref/env-3.8/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:1767: FutureWarning: The `pad_to_max_length` argument is deprecated and will be removed in a future version, use `padding=True` or `padding='longest'` to pad to the longest sequence in the batch, or use `padding='max_length'` to pad to a max length. In this case, you can give a specific length with `max_length` (e.g. `max_length=45`) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
  warnings.warn(
08/30/2022 21:51:47 - INFO - eval -   ***** Running evaluation final_evaluation *****
08/30/2022 21:51:47 - INFO - eval -     Examples number: 348
08/30/2022 22:00:59 - INFO - eval -   ***** Eval results final_evaluation *****
08/30/2022 22:00:59 - INFO - eval -     loss = 0.421
08/30/2022 22:00:59 - INFO - eval -     post pruning mention precision = 0.248
08/30/2022 22:00:59 - INFO - eval -     post pruning mention recall = 0.961
08/30/2022 22:00:59 - INFO - eval -     post pruning mention f1 = 0.395
08/30/2022 22:00:59 - INFO - eval -     mention precision = 0.893
08/30/2022 22:00:59 - INFO - eval -     mention recall = 0.878
08/30/2022 22:00:59 - INFO - eval -     mention f1 = 0.886
08/30/2022 22:00:59 - INFO - eval -     precision = 0.812
08/30/2022 22:00:59 - INFO - eval -     recall = 0.795
08/30/2022 22:00:59 - INFO - eval -     f1 = 0.804

and an error:

Traceback (most recent call last):
  File "run_coref.py", line 155, in <module>
    main()
  File "run_coref.py", line 147, in main
    result = evaluator.evaluate(model, prefix="final_evaluation", official=True)
  File "/home/tanner/git/s2e-coref/eval.py", line 138, in evaluate
    conll_results = evaluate_conll(self.args.conll_path_for_eval, doc_to_prediction, doc_to_subtoken_map)
  File "/home/tanner/git/s2e-coref/conll.py", line 98, in evaluate_conll
    output_conll(gold_file, prediction_file, predictions, subtoken_maps)
  File "/home/tanner/git/s2e-coref/conll.py", line 47, in output_conll
    start_map, end_map, word_map = prediction_map[doc_key]  
KeyError: 'bc/cctv/00/cctv_0000_0'

I am looking into this further and post back my solution if I find one. In the meanwhile, I wanted to check if @zyt888 (or @yuvalkirstain or @oriram) know what the problem seems to be.

yuvalkirstain / s2e-coref

Hello, may I ask why dev can't be used to verify the validation set using the model you gave me? Keyerror will be reported. If I ignore keyerror, AttributeError will also be generated #8