Inconsistency in prediction results of same data in multiple iteration

nency2 commented 1 year ago

I am running only the run_cdr_exp.sh script with do_predict=true and train=false and eval=false. However, I notice that the prediction probabilities differ for the same data in each iteration. Could you please help me understand the reason behind this issue and suggest a way to resolve it? I would appreciate any help you can provide.

ptlai commented 1 year ago

Hi @nency2 ,

I apologize for the late reply. It appears that you are running development set prediction without training. Therefore, the code uses random initials for the model, and the results of the development set can vary from time to time.

Po-Ting

ptlai commented 1 year ago

I've posted my email response below. Just in case anyone needs it

The checkpoint files can be downloaded from the link below. https://drive.google.com/file/d/1sxPPchQlbM_YmwamZc9w2IzlTO2wWmne/view?usp=sharing

In addition, I generated the model using the following hyperparameters:

  cuda_visible_devices=$cuda_visible_devices python src/run_bert_gt.py \
    --task_name=cdr \
    --do_train=true \
    --do_eval=true \
    --do_predict=true \
    --vocab_file=biobert_v1.1_pubmed/vocab.txt \
    --bert_config_file=biobert_v1.1_pubmed/bert_config.json \
    --init_checkpoint=biobert_v1.1_pubmed/model.ckpt-1000000 \
    --max_seq_length=512 \
    --max_num_entity_indices=20 \
    --train_batch_size=10 \
    --surrounding_words_distance=5 \
    --use_balanced_neg=true \
    --max_neg_scale=1 \
    --max_num_neighbors=5 \
    --learning_rate=5e-5 \
    --num_train_epochs=30.0 \
    --do_lower_case=false \
    --use_balanced_neg=true \
    --entity_num=2 \
    --data_dir=datasets/cdr/processed/ \
    --output_dir=out_cdr_model/

I used the following script to test my model:

cuda_visible_devices=$cuda_visible_devices python src/run_bert_gt.py \
  --do_train=false \
  --do_eval=false \
  --do_predict=true \
  --task_name="cdr" \
  --vocab_file=out_cdr_model/vocab.txt \
  --bert_config_file=out_cdr_model/bert_config.json \
  --init_checkpoint=out_cdr_model/model.ckpt-15312 \
  --num_train_epochs=30.0 \
  --seed=5511 \
  --max_seq_length=512 \
  --train_batch_size=8 \
  --learning_rate=5e-5 \
  --use_balanced_neg=false \
  --surrounding_words_distance=5 \
  --do_lower_case=false \
  --entity_num=2 \
  --max_num_neighbors=5 \
  --max_num_entity_indices=20 \
  --data_dir=datasets/cdr/processed/ \
  --output_dir=out_cdr_model2/

python src/run_eval.py --task=cdr --output_path=out_cdr_model2/test_results.tsv --answer_path=datasets/cdr/processed/test.tsv

For your reference, here is the script I used in tuning model

#!/bin/bash

cuda_visible_devices=$1
entity_num=2

for x in {1..4}
do
  cuda_visible_devices=$cuda_visible_devices python src/run_bert_gt.py \
    --task_name=cdr \
    --do_train=true \
    --do_eval=true \
    --do_predict=true \
    --vocab_file=biobert_v1.1_pubmed/vocab.txt \
    --bert_config_file=biobert_v1.1_pubmed/bert_config.json \
    --init_checkpoint=biobert_v1.1_pubmed/model.ckpt-1000000 \
    --max_seq_length=512 \
    --max_num_entity_indices=20 \
    --train_batch_size=10 \
    --surrounding_words_distance=5 \
    --use_balanced_neg=true \
    --max_neg_scale=$x \
    --max_num_neighbors=5 \
    --learning_rate=5e-5 \
    --num_train_epochs=30.0 \
    --do_lower_case=false \
    --use_balanced_neg=true \
    --entity_num=$entity_num \
    --data_dir=datasets/cdr/processed/ \
    --output_dir=out_model_cdr_bs$x/
done
for x in {1..4}
do
  python src/run_eval.py --task=cdr --output_path=out_model_cdr_bs$x/dev_results.tsv --answer_path=datasets/cdr/processed/dev.tsv
  python src/run_eval.py --task=cdr --output_path=out_model_cdr_bs$x/test_results.tsv --answer_path=datasets/cdr/processed/test.tsv
done

ncbi / bert_gt

Inconsistency in prediction results of same data in multiple iteration #6