test_TFEngine.py:test_rec_subnet_eval_init_out_apply0 fails with TF 1.15 + GPU

Setup:

CUDA 10.0
tensorflow_gpu==1.15
GPU: GTX 960M

To reproduce:

nosetests tests/test_TFEngine.py:test_rec_subnet_eval_init_out_apply0

Possibly the relevant part of the error message:

EXCEPTION
Traceback (most recent call last):
  File "/home/returnn/TFEngine.py", line 605, in run
    line: raise Exception("Inf/nan score in step %i." % step)
    locals:
      Exception = <builtin> <class 'Exception'>
      step = <local> 1
Exception: Inf/nan score in step 1.

Full error message:

2019-12-06 16:03:07.792571: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-06 16:03:07.822247: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2592000000 Hz
2019-12-06 16:03:07.824407: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x523be30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-12-06 16:03:07.824471: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2019-12-06 16:03:07.831818: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-06 16:03:07.924097: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:07.924465: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x54d5650 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2019-12-06 16:03:07.924481: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 960M, Compute Capability 5.0
2019-12-06 16:03:07.924641: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:07.924921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.0975
pciBusID: 0000:01:00.0
2019-12-06 16:03:07.926387: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-06 16:03:07.947731: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-06 16:03:07.959581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-12-06 16:03:07.963296: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-12-06 16:03:07.993485: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-12-06 16:03:08.015845: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-12-06 16:03:08.115568: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-06 16:03:08.115829: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.116602: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.117192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-06 16:03:08.117999: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-06 16:03:08.120021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-06 16:03:08.120062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2019-12-06 16:03:08.120080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2019-12-06 16:03:08.121100: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.122049: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.122748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1417 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-12-06 16:03:08.146324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-06 16:03:08.146368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      
2019-12-06 16:03:08.147713: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.148401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.0975
pciBusID: 0000:01:00.0
2019-12-06 16:03:08.148479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-06 16:03:08.148517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-06 16:03:08.148555: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-12-06 16:03:08.148587: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-12-06 16:03:08.148625: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-12-06 16:03:08.148657: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-12-06 16:03:08.148693: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-06 16:03:08.148825: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.149501: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.150052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-06 16:03:08.150102: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-06 16:03:08.150118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2019-12-06 16:03:08.150136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2019-12-06 16:03:08.150317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.151041: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.151438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 1417 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-12-06 16:03:08.180104: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.180392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.0975
pciBusID: 0000:01:00.0
2019-12-06 16:03:08.180426: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-06 16:03:08.180442: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-06 16:03:08.180463: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-12-06 16:03:08.180482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-12-06 16:03:08.180503: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-12-06 16:03:08.180523: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-12-06 16:03:08.180544: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-06 16:03:08.180605: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.180864: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.181106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-06 16:03:08.181141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-06 16:03:08.181151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2019-12-06 16:03:08.181162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2019-12-06 16:03:08.181253: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.181524: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-06 16:03:08.181778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1417 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-12-06 16:09:07.708405: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-06 16:09:08.121483: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
EXCEPTION
Traceback (most recent call last):
  File "/home/returnn/TFEngine.py", line 605, in run
    line: raise Exception("Inf/nan score in step %i." % step)
    locals:
      Exception = <builtin> <class 'Exception'>
      step = <local> 1
Exception: Inf/nan score in step 1.
E
======================================================================
ERROR: test_TFEngine.test_rec_subnet_eval_init_out_apply0
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/returnn/env/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
    line: self.test(*self.arg)
    locals:
      self = <local> test_TFEngine.test_rec_subnet_eval_init_out_apply0
      self.test = <local> <function test_rec_subnet_eval_init_out_apply0 at 0x7fa6f87f4d08>
      self.arg = <local> ()
  File "/home/returnn/tests/test_TFEngine.py", line 1881, in test_rec_subnet_eval_init_out_apply0
    line: engine.train()
    locals:
      engine = <local> <TFEngine.Engine object at 0x7fa6f87b7a90>
      engine.train = <local> <bound method Engine.train of <TFEngine.Engine object at 0x7fa6f87b7a90>>
  File "/home/returnn/TFEngine.py", line 1220, in train
    line: self.train_epoch()
    locals:
      self = <local> <TFEngine.Engine object at 0x7fa6f87b7a90>
      self.train_epoch = <local> <bound method Engine.train_epoch of <TFEngine.Engine object at 0x7fa6f87b7a90>>
  File "/home/returnn/TFEngine.py", line 1335, in train_epoch
    line: sys.exit(1)
    locals:
      sys = <global> <module 'sys' (built-in)>
      sys.exit = <global> <built-in function exit>
SystemExit: 1
-------------------- >> begin captured stdout << ---------------------
Create engine.
setup_tf_thread_pools() not yet called (via func get_tf_list_local_devices), calling it now.
Setup TF inter and intra global thread pools, num_threads None, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}}.
Collecting TensorFlow device list...
Init for train.
layer root/'data' output: Data(name='data', shape=(None, 2), batch_shape_meta=[B,T|'time:var:extern_data:data',F|2])
layer root/'lstm0_fw' output: Data(name='lstm0_fw_output', shape=(None, 2), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])
OpCodeCompiler call: /usr/local/cuda-10.0/bin/nvcc -shared -O2 -std=c++11 -I /home/returnn/env/lib/python3.6/site-packages/tensorflow_core/include -I /home/returnn/env/lib/python3.6/site-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda-10.0/include -L /usr/local/cuda-10.0/lib64 -x cu -v -DGOOGLE_CUDA=1 -Xcompiler -fPIC -Xcompiler -v -arch compute_50 -D_GLIBCXX_USE_CXX11_ABI=0 -g /tmp/pzelasko/returnn_tf_cache/ops/NativeLstm2/d3517f1b2d/NativeLstm2.cc -o /tmp/pzelasko/returnn_tf_cache/ops/NativeLstm2/d3517f1b2d/NativeLstm2.so -L/home/returnn/env/lib/python3.6/site-packages/numpy/.libs -l:libopenblasp-r0-34a18dc3.3.7.so -L/home/returnn/env/lib/python3.6/site-packages/scipy/.libs -l:libopenblasp-r0-2ecf47d5.3.7.dev.so -L/home/returnn/env/lib/python3.6/site-packages/tensorflow_core -l:libtensorflow_framework.so.1
OpCodeCompiler call: /usr/local/cuda-10.0/bin/nvcc -shared -O2 -std=c++11 -I /home/returnn/env/lib/python3.6/site-packages/tensorflow_core/include -I /home/returnn/env/lib/python3.6/site-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda-10.0/include -L /usr/local/cuda-10.0/lib64 -x cu -v -DGOOGLE_CUDA=1 -Xcompiler -fPIC -Xcompiler -v -arch compute_50 -D_GLIBCXX_USE_CXX11_ABI=0 -g /tmp/pzelasko/returnn_tf_cache/ops/GradOfNativeLstm2/02d2f71824/GradOfNativeLstm2.cc -o /tmp/pzelasko/returnn_tf_cache/ops/GradOfNativeLstm2/02d2f71824/GradOfNativeLstm2.so -L/home/returnn/env/lib/python3.6/site-packages/numpy/.libs -l:libopenblasp-r0-34a18dc3.3.7.so -L/home/returnn/env/lib/python3.6/site-packages/scipy/.libs -l:libopenblasp-r0-2ecf47d5.3.7.dev.so -L/home/returnn/env/lib/python3.6/site-packages/tensorflow_core -l:libtensorflow_framework.so.1
layer root/'lstm0_bw' output: Data(name='lstm0_bw_output', shape=(None, 2), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])
layer root/'lstm0_pool' output: Data(name='lstm0_pool_output', shape=(None, 4), batch_shape_meta=[B,T|?,F|4])
layer root/'encoder' output: Data(name='encoder_output', shape=(None, 4), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',F|4])
layer root/'enc_ctx0' output: Data(name='enc_ctx0_output', shape=(None, 4), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',F|4])
layer root/'enc_ctx' output: Data(name='enc_ctx_output', shape=(None, 2, 2), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',2,F|2])
layer root/'inv_fertility' output: Data(name='inv_fertility_output', shape=(None, 1), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',F|1])
layer root/'enc_value0' output: Data(name='enc_value0_output', shape=(None, 4), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',F|4])
layer root/'enc_value' output: Data(name='enc_value_output', shape=(None, 2, 2), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',2,F|2])
layer root/'output' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=3, batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B])
layer root/output:rec-subnet-input/'output' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=3, batch_shape_meta=[B,T|'time:var:extern_data:classes'])
layer root/output:rec-subnet-input/'target_embed' output: Data(name='target_embed_output', shape=(None, 6), batch_shape_meta=[B,T|'time:var:extern_data:classes',F|6])
layer root/output:rec-subnet/'weight_feedback' output: Data(name='weight_feedback_output', shape=(None, 2), batch_shape_meta=[B,T|?,F|2])
layer root/output:rec-subnet/'prev_s_state' output: Data(name='prev_s_state_output', shape=(4,), time_dim_axis=None, batch_shape_meta=[B,F|4])
layer root/output:rec-subnet/'prev_s_feedback' output: Data(name='prev_s_feedback_output', shape=(2,), time_dim_axis=None, batch_shape_meta=[B,F|2])
layer root/output:rec-subnet/'att_query0' output: Data(name='att_query0_output', shape=(None, 2), batch_shape_meta=[B,T|?,F|2])
layer root/output:rec-subnet/'att_query1' output: Data(name='att_query1_output', shape=(None, 2), batch_shape_meta=[B,T|?,F|2])
layer root/output:rec-subnet/'att_query2' output: Data(name='att_query2_output', shape=(None, 4), batch_shape_meta=[B,T|?,F|4])
layer root/output:rec-subnet/'att_query' output: Data(name='att_query_output', shape=(None, 2, 2), batch_shape_meta=[B,T|?,2,F|2])
layer root/output:rec-subnet/'energy' output: Data(name='energy_output', shape=(None, 2, 1), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',2,F|1])
layer root/output:rec-subnet/'att_weights' output: Data(name='att_weights_output', shape=(2, 1, None), time_dim_axis=3, feature_dim_axis=2, batch_shape_meta=[B,2,F|1,T|'spatial:0:lstm0_pool'])
layer root/output:rec-subnet/'att0' output: Data(name='att0_output', shape=(2, 2), time_dim_axis=None, batch_shape_meta=[B,2,F|2])
layer root/output:rec-subnet/'att' output: Data(name='att_output', shape=(4,), time_dim_axis=None, batch_shape_meta=[B,F|4])
layer root/output:rec-subnet/'target_embed' output: Data(name='target_embed_output', shape=(6,), time_dim_axis=None, batch_shape_meta=[B,F|6])
layer root/output:rec-subnet/'s' output: Data(name='s_output', shape=(2,), time_dim_axis=None, batch_shape_meta=[B,F|2])
layer root/output:rec-subnet/'att_weights_avg' output: Data(name='att_weights_avg_output', shape=(1, None), time_dim_axis=2, feature_dim_axis=1, batch_shape_meta=[B,F|1,T|'spatial:0:lstm0_pool'])
layer root/output:rec-subnet/'accum_att_weights' output: Data(name='accum_att_weights_output', shape=(None, 1), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',F|1])
layer root/output:rec-subnet-output/'s' output: Data(name='s_output', shape=(None, 2), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|2])
layer root/output:rec-subnet-output/'prev:s' output: Data(name='s_output', shape=(None, 2), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|2])
layer root/output:rec-subnet-output/'prev:target_embed' output: Data(name='target_embed_output', shape=(None, 6), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|6])
layer root/output:rec-subnet-output/'att' output: Data(name='att_output', shape=(None, 4), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|4])
layer root/output:rec-subnet-output/'readout_in' output: Data(name='readout_in_output', shape=(None, 2), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|2])
layer root/output:rec-subnet-output/'readout' output: Data(name='readout_output', shape=(None, 1), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|1])
layer root/output:rec-subnet-output/'output_prob' output: Data(name='output_prob_output', shape=(None, 3), batch_dim_axis=1, batch_shape_meta=[T|'time:var:extern_data:classes',B,F|3])
Train.

--------------------- >> end captured stdout << ----------------------
-------------------- >> begin captured logging << --------------------
v2: INFO: Device not set explicitly, and we found a GPU, which we will use.
v4: DEBUG: Learning-rate-control: no file specified, not saving history (no proper restart possible)
v2: INFO: Setup tf.Session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
v3: DEBUG: Rec layer 'output' (search False, train 'globals/train_flag:0') sub net:
v3: DEBUG:   Input layers moved out of loop: (#: 2)
v3: DEBUG:     output
v3: DEBUG:     target_embed
v3: DEBUG:   Output layers moved out of loop: (#: 3)
v3: DEBUG:     output_prob
v3: DEBUG:     readout
v3: DEBUG:     readout_in
v3: DEBUG:   Layers in loop: (#: 14)
v3: DEBUG:     s
v3: DEBUG:     att
v3: DEBUG:     att0
v3: DEBUG:     att_weights
v3: DEBUG:     energy
v3: DEBUG:     att_query
v3: DEBUG:     att_query2
v3: DEBUG:     att_query1
v3: DEBUG:     att_query0
v3: DEBUG:     weight_feedback
v3: DEBUG:     accum_att_weights
v3: DEBUG:     att_weights_avg
v3: DEBUG:     prev_s_feedback
v3: DEBUG:     prev_s_state
v3: DEBUG:   Unused layers: (#: 1)
v3: DEBUG:     end
v3: DEBUG: <DotLayer 'output/energy' out_type=Data(shape=(None, 2, 1), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',2,F|1])>, red1=-1, red2=-1, var1=None, var2=None:
v3: DEBUG:   a: Data(name='enc_ctx_output', shape=(None, 2, 2), batch_shape_meta=[B,T|'spatial:0:lstm0_pool',2,F|2]) Tensor("enc_ctx/Reshape:0", shape=(?, ?, 2, 2), dtype=float32)
v3: DEBUG:   a_rem_axes: [0, 1, 2] dims: [<tf.Tensor 'output/rec/energy/strided_slice:0' shape=() dtype=int32>, <tf.Tensor 'output/rec/energy/strided_slice_1:0' shape=() dtype=int32>, 2] a_var_axes: [] dims: [] 1 a_reduce_axes: [3] dims: [2] 2 transpose_a: False
v3: DEBUG:   b: Data(name='att_query_output', shape=(None, 2, 2), batch_shape_meta=[B,T|?,2,F|2]) Tensor("output/rec/att_query/Reshape:0", shape=(?, ?, 2, 2), dtype=float32)
v3: DEBUG:   b_rem_axes: [0, 1, 2] dims: [<tf.Tensor 'output/rec/energy/strided_slice_2:0' shape=() dtype=int32>, <tf.Tensor 'output/rec/energy/strided_slice_3:0' shape=() dtype=int32>, 2] b_var_axes: [] dims: [] 1 b_reduce_axes: [3] dims: [2] 2 transpose_b: False
v2: INFO: Network layer topology:
v2: INFO:   extern data: classes: Data(shape=(None,), dtype='int32', sparse=True, dim=3, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:classes']), data: Data(shape=(None, 2), batch_shape_meta=[B,T|'time:var:extern_data:data',F|2])
v2: INFO:   used data keys: ['classes', 'data']
v2: INFO:   layers:
v2: INFO:     layer source 'data' #: 2
v2: INFO:     layer split_dims 'enc_ctx' #: 2
v2: INFO:     layer linear 'enc_ctx0' #: 4
v2: INFO:     layer split_dims 'enc_value' #: 2
v2: INFO:     layer linear 'enc_value0' #: 4
v2: INFO:     layer copy 'encoder' #: 4
v2: INFO:     layer linear 'inv_fertility' #: 1
v2: INFO:     layer rec 'lstm0_bw' #: 2
v2: INFO:     layer rec 'lstm0_fw' #: 2
v2: INFO:     layer pool 'lstm0_pool' #: 4
v2: INFO:     layer rec 'output' #: 3
v2: INFO: net params #: 288
v2: INFO: net trainable params: [<tf.Variable 'enc_ctx0/W:0' shape=(4, 4) dtype=float32_ref>, <tf.Variable 'enc_value0/W:0' shape=(4, 4) dtype=float32_ref>, <tf.Variable 'inv_fertility/W:0' shape=(4, 1) dtype=float32_ref>, <tf.Variable 'lstm0_bw/rec/W:0' shape=(2, 8) dtype=float32_ref>, <tf.Variable 'lstm0_bw/rec/W_re:0' shape=(2, 8) dtype=float32_ref>, <tf.Variable 'lstm0_bw/rec/b:0' shape=(8,) dtype=float32_ref>, <tf.Variable 'lstm0_fw/rec/W:0' shape=(2, 8) dtype=float32_ref>, <tf.Variable 'lstm0_fw/rec/W_re:0' shape=(2, 8) dtype=float32_ref>, <tf.Variable 'lstm0_fw/rec/b:0' shape=(8,) dtype=float32_ref>, <tf.Variable 'output/rec/att_query2/W:0' shape=(2, 4) dtype=float32_ref>, <tf.Variable 'output/rec/output_prob/W:0' shape=(1, 3) dtype=float32_ref>, <tf.Variable 'output/rec/output_prob/b:0' shape=(3,) dtype=float32_ref>, <tf.Variable 'output/rec/prev_s_feedback/W:0' shape=(4, 2) dtype=float32_ref>, <tf.Variable 'output/rec/readout_in/W:0' shape=(12, 2) dtype=float32_ref>, <tf.Variable 'output/rec/readout_in/b:0' shape=(2,) dtype=float32_ref>, <tf.Variable 'output/rec/s/rec/lstm_cell/bias:0' shape=(8,) dtype=float32_ref>, <tf.Variable 'output/rec/s/rec/lstm_cell/kernel:0' shape=(12, 8) dtype=float32_ref>, <tf.Variable 'output/rec/target_embed/W:0' shape=(3, 6) dtype=float32_ref>, <tf.Variable 'output/rec/weight_feedback/W:0' shape=(1, 2) dtype=float32_ref>]
v3: DEBUG: start training at epoch 1 and step 0
v4: DEBUG: using batch size: 10, max seqs: -1
v4: DEBUG: learning rate control: ConstantLearningRate(default_learning_rate=0.01, min_learning_rate=0.0, default_learning_rates=[
], error_measure_key=None, relative_error_also_relative_to_learning_rate=False, min_num_epochs_per_new_learning_rate=0, relative_error_div_by_old=False, filename=None), epoch data: , error key: None
v4: DEBUG: pretrain: None
v4: DEBUG: start epoch 1 with learning rate 0.01 ...
v5: DEBUG: TF: log_dir: /tmp/tmp8oqnzp_9/dataset_id140353380097552-2019-12-06-15-03-07
v2: INFO: Create NAdam optimizer.
v3: DEBUG: Initialize optimizer (default) with slots ['m', 'v'].
v3: DEBUG: These additional variable were created by the optimizer: [<tf.Variable 'optimize/beta1_power:0' shape=() dtype=float32_ref>, <tf.Variable 'optimize/beta2_power:0' shape=() dtype=float32_ref>].
v5: DEBUG: train epoch 1, step 0, cost:output/output_prob 1.116946332409178, error:output/output_prob 0.7000000104308128, loss 11.169463, 1.711 sec/step, elapsed 0:00:02, exp. remaining 0:00:00, complete 75.00%
v5: DEBUG: train epoch 1, step 1, cost:output/output_prob 1.1040765926849332, error:output/output_prob 0.9000000134110451, loss 11.040766, 0.013 sec/step, elapsed 0:00:02, exp. remaining 0:00:00, complete 75.00%
v3: DEBUG: train epoch 1, finished after 2 steps, 0:00:02 elapsed (59.9% computing time)
v4: DEBUG: Save model under /tmp/tmp8oqnzp_9/model.001
v4: DEBUG: Learning-rate-control: error key 'train_score' from {'train_score': 1.1105114625470556, 'train_error': 0.800000011920929}
v1: INFO: epoch 1 score: 1.1105114625470556 error: 0.800000011920929 elapsed: 0:00:02
v5: DEBUG: TF: log_dir: /tmp/tmp8oqnzp_9/dataset_id140355110496464-1-2019-12-06-15-03-07
v5: DEBUG: epoch 1 'dev' eval, step 0, cost:output/output_prob 1.0868968171908975, error:output/output_prob 0.6000000089406967, loss 10.868968, 0.103 sec/step, elapsed 0:00:00, exp. remaining 0:00:00, complete 50.00%
v3: DEBUG: epoch 1 'dev' eval, finished after 1 steps, 0:00:00 elapsed (53.5% computing time)
v4: DEBUG: Learning-rate-control: error key 'dev_score' from {'dev_score': 1.0868968171908975, 'dev_error': 0.6000000089406967}
v1: INFO: dev: score 1.0868968171908975 error 0.6000000089406967
v4: DEBUG: start epoch 2 with learning rate 0.01 ...
v5: DEBUG: TF: log_dir: /tmp/tmp8oqnzp_9/dataset_id140353380097552-2019-12-06-15-03-07
v5: DEBUG: train epoch 2, step 0, cost:output/output_prob 1.0868968171908975, error:output/output_prob 0.6000000089406967, loss 10.868968, 0.018 sec/step, elapsed 0:00:00, exp. remaining 0:00:00, complete 75.00%
v5: DEBUG: train epoch 2, step 1, cost:output/output_prob nan, error:output/output_prob 0.7000000104308128, loss nan, 0.012 sec/step, elapsed 0:00:00, exp. remaining 0:00:00, complete 75.00%
v1: INFO: Model seems broken, got inf or nan score.
v1: INFO: Accumulated scores: NumbersDict({'cost:output/output_prob': nan, 'loss': nan, 'error:output/output_prob': 13.0})
v1: INFO: Exception Exception('Inf/nan score in step 1.',) in step 1.
v2: INFO: Step meta information:
v2: INFO: {'seq_idx': [2, 3], 'seq_tag': ['seq-2', 'seq-3']}
v2: INFO: Feed dict:
v2: INFO:   <tf.Tensor 'extern_data/placeholders/classes/classes:0' shape=(?, ?) dtype=int32>: shape (2, 5), dtype int32, min/max 0/2, Data(name='classes', shape=(None,), dtype='int32', sparse=True, dim=3, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:classes'])
v2: INFO:   <tf.Tensor 'extern_data/placeholders/classes/classes_dim0_size:0' shape=(?,) dtype=int32>: shape (2,), dtype int32, min/max 5/5, ([5 5])
v2: INFO:   <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 2) dtype=float32>: shape (2, 5, 2), dtype float32, min/max -0.5/0.4, mean/stddev -0.05/0.28722814, Data(name='data', shape=(None, 2), batch_shape_meta=[B,T|'time:var:extern_data:data',F|2])
v2: INFO:   <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (2,), dtype int32, min/max 5/5, ([5 5])
v2: INFO:   <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
v4: DEBUG: Save model under /tmp/tmp8oqnzp_9/model.002.crash_1
v1: INFO: Trainer not finalized, quitting.
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 361.751s

FAILED (errors=1)

rwth-i6 / returnn

test_TFEngine.py:test_rec_subnet_eval_init_out_apply0 fails with TF 1.15 + GPU #253