ymcui / Chinese-ELECTRA

Pre-trained Chinese ELECTRA(中文ELECTRA预训练模型)
http://electra.hfl-rc.com
Apache License 2.0
1.4k stars 171 forks source link

error in loading checkpoints for pretraining #49

Closed w5688414 closed 4 years ago

w5688414 commented 4 years ago

error in loading checkpoints for pretraining, adam_m is missing?

2020-08-11 22:40:26.262591: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
     [[node save/RestoreV2 (defined at run_pretraining.py:363) ]]

Original stack trace for 'save/RestoreV2':
  File "run_pretraining.py", line 404, in <module>
    main()
  File "run_pretraining.py", line 400, in main
    args.model_name, args.data_dir, **hparams))
  File "run_pretraining.py", line 363, in train_or_eval
    max_steps=config.num_train_steps)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
    saving_listeners=saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
    saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 725, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
    return self._sess_creator.create_session()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 871, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 638, in create_session
    self._scaffold.finalize()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 237, in finalize
    self._saver.build()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 502, in _build_internal
    restore_sequentially, reshape)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 381, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

Traceback (most recent call last):
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
     [[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 1286, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
     [[node save/RestoreV2 (defined at run_pretraining.py:363) ]]

Original stack trace for 'save/RestoreV2':
  File "run_pretraining.py", line 404, in <module>
    main()
  File "run_pretraining.py", line 400, in main
    args.model_name, args.data_dir, **hparams))
  File "run_pretraining.py", line 363, in train_or_eval
    max_steps=config.num_train_steps)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
    saving_listeners=saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
    saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 725, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
    return self._sess_creator.create_session()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 871, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 638, in create_session
    self._scaffold.finalize()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 237, in finalize
    self._saver.build()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 502, in _build_internal
    restore_sequentially, reshape)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 381, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 1296, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 1614, in object_graph_key_mapping
    object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 678, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_pretraining.py", line 404, in <module>
    main()
  File "run_pretraining.py", line 400, in main
    args.model_name, args.data_dir, **hparams))
  File "run_pretraining.py", line 363, in train_or_eval
    max_steps=config.num_train_steps)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
    rendezvous.raise_errors()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
    six.reraise(typ, value, traceback)
  File "/home/test/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
    saving_listeners=saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
    saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 725, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
    return self._sess_creator.create_session()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 871, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 647, in create_session
    init_fn=self._scaffold.init_fn)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/session_manager.py", line 290, in prepare_session
    config=config)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/session_manager.py", line 220, in _restore_checkpoint
    saver.restore(sess, ckpt.model_checkpoint_path)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 1302, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
     [[node save/RestoreV2 (defined at run_pretraining.py:363) ]]

Original stack trace for 'save/RestoreV2':
  File "run_pretraining.py", line 404, in <module>
    main()
  File "run_pretraining.py", line 400, in main
    args.model_name, args.data_dir, **hparams))
  File "run_pretraining.py", line 363, in train_or_eval
    max_steps=config.num_train_steps)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
    saving_listeners=saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
    saving_listeners)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 725, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
    return self._sess_creator.create_session()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 871, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 638, in create_session
    self._scaffold.finalize()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 237, in finalize
    self._saver.build()
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 502, in _build_internal
    restore_sequentially, reshape)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 381, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/home/test/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()
ymcui commented 4 years ago

We did not put adma_* params in the released model to save disk space. Please check https://github.com/google-research/electra/issues/45 for more details.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ymcui commented 4 years ago

Closing since not updates.