nyu-dl / dl4marco-bert

BSD 3-Clause "New" or "Revised" License
476 stars 87 forks source link

Eval only for MS MARCO #47

Open tangzhy opened 3 years ago

tangzhy commented 3 years ago

Hi, I use the colab code exactly from your demo.

Model config

The general is the same as follows, except that OUTPUT_DIR is changed to your decompressed bert based directory, and the batch_size is set to 8 since I'm running on V100-16GB. I also changed the MAX_EVAL_EXAMPLES=100 because it takes too much time to get the full eval performance.

BERT_PRETRAINED_DIR = '/search/odin/Data/pre-trained-models/bert/uncased_L-12_H-768_A-12/'
OUTPUT_DIR = '/search/odin/Data/marco-passage-ranking/models/BERT_Base_trained_on_MSMARCO/'
DATA_DIR = '/search/odin/Data/marco-passage-ranking/tfrecord/'

USE_TPU = False
DO_TRAIN = False  # Whether to run training.
DO_EVAL = True  # Whether to run evaluation.
TRAIN_BATCH_SIZE = 8
EVAL_BATCH_SIZE = 8
LEARNING_RATE = 1e-6
NUM_TRAIN_STEPS = 100
NUM_WARMUP_STEPS = 40000
MAX_SEQ_LENGTH = 512
SAVE_CHECKPOINTS_STEPS = 10
ITERATIONS_PER_LOOP = 100
NUM_TPU_CORES = 8
BERT_CONFIG_FILE = os.path.join(BERT_PRETRAINED_DIR, 'bert_config.json')
INIT_CHECKPOINT = os.path.join(BERT_PRETRAINED_DIR, 'bert_model.ckpt')
MSMARCO_OUTPUT = False  # Write the predictions to a MS-MARCO-formatted file.
MAX_EVAL_EXAMPLES = 100  # Maximum number of examples to be evaluated.
NUM_EVAL_DOCS = 1000  # Number of docs per query in the dev and eval files.
METRICS_MAP = ['MAP', 'RPrec', 'NDCG', 'MRR', 'MRR@10']

Logging

The logging and performance are listed as follows. My concerns are:

  1. Is the model loaded from your fine-tuned checkpoint properly? BTW, no logging info like *INIT_FROM_CKPT* occurs.
  2. Why is the trained model performance so poor? MRR@10 = 0.01 for the top 100 eval examples. Is that expected? Since I only run for 100 eval examples (100 * 1000 entries are actually predicted.)
  3. If the model is loaded improperly, how shall I load the model instead? Any example code?
WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:101: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W1214 14:42:01.364466 140473733826368 module_wrapper.py:139] From /search/odin/Codes/marco-passage-ranking/modeling.py:101: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1214 14:42:01.366216 140473733826368 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fc2139b9cb0>) includes params argument, but params are not passed to Estimator.
W1214 14:42:01.737018 140473733826368 estimator.py:1994] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fc2139b9cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': '/search/odin/Data/marco-passage-ranking/models/BERT_Base_trained_on_MSMARCO/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 10, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fc2104ed710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I1214 14:42:01.739134 140473733826368 estimator.py:212] Using config: {'_model_dir': '/search/odin/Data/marco-passage-ranking/models/BERT_Base_trained_on_MSMARCO/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 10, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fc2104ed710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I1214 14:42:01.740160 140473733826368 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W1214 14:42:01.740942 140473733826368 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:***** Running evaluation *****
I1214 14:42:01.741715 140473733826368 <ipython-input-3-e0f70c5ba30e>:280] ***** Running evaluation *****
INFO:tensorflow:  Batch size = 8
I1214 14:42:01.742430 140473733826368 <ipython-input-3-e0f70c5ba30e>:281]   Batch size = 8
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W1214 14:42:01.750143 140473733826368 deprecation.py:506] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenSequenceFeature is deprecated. Please use tf.io.FixedLenSequenceFeature instead.

W1214 14:42:01.832499 140473733826368 module_wrapper.py:139] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenSequenceFeature is deprecated. Please use tf.io.FixedLenSequenceFeature instead.

WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

W1214 14:42:01.833697 140473733826368 module_wrapper.py:139] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

W1214 14:42:01.834634 140473733826368 module_wrapper.py:139] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:190: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W1214 14:42:02.021957 140473733826368 module_wrapper.py:139] From /search/odin/Codes/marco-passage-ranking/modeling.py:190: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:458: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W1214 14:42:02.024990 140473733826368 module_wrapper.py:139] From /search/odin/Codes/marco-passage-ranking/modeling.py:458: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:743: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1214 14:42:02.076209 140473733826368 deprecation.py:323] From /search/odin/Codes/marco-passage-ranking/modeling.py:743: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1214 14:42:02.077801 140473733826368 deprecation.py:323] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:314: The name tf.erf is deprecated. Please use tf.math.erf instead.

W1214 14:42:02.173272 140473733826368 module_wrapper.py:139] From /search/odin/Codes/marco-passage-ranking/modeling.py:314: The name tf.erf is deprecated. Please use tf.math.erf instead.

WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1214 14:42:04.507863 140473733826368 deprecation.py:323] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:Read 10000 examples in 136 secs. Metrics so far:
W1214 14:44:17.816093 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 10000 examples in 136 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 14:44:17.818014 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.00090869 0.         0.07455417 0.00085925 0.        ]
W1214 14:44:17.818776 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00090869 0.         0.07455417 0.00085925 0.        ]
WARNING:tensorflow:Read 20000 examples in 262 secs. Metrics so far:
W1214 14:46:24.263339 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 20000 examples in 262 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 14:46:24.265303 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.00100144 0.         0.08361872 0.00097672 0.        ]
W1214 14:46:24.266042 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00100144 0.         0.08361872 0.00097672 0.        ]
WARNING:tensorflow:Read 30000 examples in 388 secs. Metrics so far:
W1214 14:48:30.680611 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 30000 examples in 388 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 14:48:30.682710 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.00108026 0.         0.0901455  0.00106377 0.        ]
W1214 14:48:30.683455 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00108026 0.         0.0901455  0.00106377 0.        ]
WARNING:tensorflow:Read 40000 examples in 515 secs. Metrics so far:
W1214 14:50:37.156615 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 40000 examples in 515 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 14:50:37.158547 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.00102465 0.         0.08564832 0.00101229 0.        ]
W1214 14:50:37.159287 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00102465 0.         0.08564832 0.00101229 0.        ]
WARNING:tensorflow:Read 50000 examples in 641 secs. Metrics so far:
W1214 14:52:43.648328 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 50000 examples in 641 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 14:52:43.650337 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.00102337 0.         0.08508468 0.00101348 0.        ]
W1214 14:52:43.651077 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00102337 0.         0.08508468 0.00101348 0.        ]
WARNING:tensorflow:Read 60000 examples in 768 secs. Metrics so far:
W1214 14:54:50.169556 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 60000 examples in 768 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 14:54:50.171495 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.00112496 0.         0.08702416 0.00111672 0.        ]
W1214 14:54:50.172239 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00112496 0.         0.08702416 0.00111672 0.        ]
WARNING:tensorflow:Read 70000 examples in 894 secs. Metrics so far:
W1214 14:56:56.690977 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 70000 examples in 894 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 14:56:56.692907 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.00109861 0.         0.0863206  0.00109154 0.        ]
W1214 14:56:56.693676 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00109861 0.         0.0863206  0.00109154 0.        ]
WARNING:tensorflow:Read 80000 examples in 1021 secs. Metrics so far:
W1214 14:59:03.240334 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 80000 examples in 1021 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 14:59:03.242281 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.01356263 0.0125     0.09699456 0.01355645 0.0125    ]
W1214 14:59:03.243049 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.01356263 0.0125     0.09699456 0.01355645 0.0125    ]
WARNING:tensorflow:Read 90000 examples in 1148 secs. Metrics so far:
W1214 15:01:09.778834 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 90000 examples in 1148 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 15:01:09.780769 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.01214304 0.01111111 0.09414841 0.01213754 0.01111111]
W1214 15:01:09.781500 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.01214304 0.01111111 0.09414841 0.01213754 0.01111111]
WARNING:tensorflow:Read 100000 examples in 1274 secs. Metrics so far:
W1214 15:03:16.311386 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 100000 examples in 1274 secs. Metrics so far:
WARNING:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
W1214 15:03:16.313336 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP  RPrec  NDCG  MRR  MRR@10
WARNING:tensorflow:[0.01104599 0.01       0.09408713 0.01104105 0.01      ]
W1214 15:03:16.314079 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.01104599 0.01       0.09408713 0.01104105 0.01      ]
INFO:tensorflow:Eval dev:
I1214 15:03:16.407423 140473733826368 <ipython-input-3-e0f70c5ba30e>:368] Eval dev:
INFO:tensorflow:MAP  RPrec  NDCG  MRR  MRR@10
I1214 15:03:16.408445 140473733826368 <ipython-input-3-e0f70c5ba30e>:369] MAP  RPrec  NDCG  MRR  MRR@10
INFO:tensorflow:[0.01104599 0.01       0.09408713 0.01104105 0.01      ]
I1214 15:03:16.409163 140473733826368 <ipython-input-3-e0f70c5ba30e>:370] [0.01104599 0.01       0.09408713 0.01104105 0.01      ]
An exception has occurred, use %tb to see the full traceback.

SystemExit

/root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3426: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
rodrigonogueira4 commented 3 years ago

Is the model loaded from your fine-tuned checkpoint properly? BTW, no logging info like INIT_FROM_CKPT occurs.

It seems that the checkpoint is not being loaded.

Why is the trained model performance so poor? MRR@10 = 0.01 for the top 100 eval examples. Is that expected? Since I only run for 100 eval examples (100 * 1000 entries are actually predicted.)

MRR@10 should be at least 0.30.

If the model is loaded improperly, how shall I load the model instead? Any example code?

I would first try to use a "dummy" path in which no checkpoint exists. If the log is identical to what you have now, then the problem is in BERT_PRETRAINED_DIR.