nouhadziri / THRED

The implementation of the paper "Augmenting Neural Response Generation with Context-Aware Topical Attention"
https://arxiv.org/abs/1811.01063
MIT License
111 stars 25 forks source link

NaN error when clip gradients. #28

Open LTlitong opened 4 years ago

LTlitong commented 4 years ago

Hi, The Vanilla Seq2Seq and HRED models report a "NaN tensor error" at the first training step.

The error code is clipped_grads, grad_norm = tf.clip_by_global_norm(self.gradients, params.max_gradient_norm) in hred_model.py.

How can I solve this problem?

P.S.

It tracebacks:

Traceback (most recent call last): File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/data/HRED/thred/main.py", line 6, in tf.app.run(main=thred_main) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/data/HRED/thred/main.py", line 45, in main model.train() File "/data/HRED/thred/models/hierarchical_base.py", line 132, in train step_result = loaded_train_model.train(train_sess) File "/data/HRED/thred/models/hred/hred_model.py", line 446, in train self.learning_rate]) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values [[node hred_graph/VerifyFinite/CheckNumerics (defined at /data/HRED/thred/models/hred/hred_model.py:131) = CheckNumericsT=DT_FLOAT, _class=["loc:@hred_graph/VerifyFinite/control_dependency"], message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:GPU:0"]] [[{{node hred_graph/clip_by_global_norm/mul/_187}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3642_hred_graph/clip_by_global_norm/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]