rwth-i6 / returnn

The RWTH extensible training framework for universal recurrent neural networks
http://returnn.readthedocs.io/
Other
348 stars 130 forks source link

Manage the initialization of a layer with the `reuse_params` flag from a layer in a different loop #555

Closed aleksglushko closed 3 years ago

aleksglushko commented 3 years ago

There is no problem when using reuse_params layer flag with only one softmax layer with loss, like here:

network = {
    "output": {"class": "rec", "from": "data", "unit": {
        "input": {"class": "copy", "from": ["prev:output", "data:source"]},

        "FF_0": {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10},
        "FF_1": {"activation": "tanh", "class": "linear", "from": ["input"], "n_out": 10,
                 "reuse_params": {"map": {"W": {"reuse_layer": "FF_0"}, 
                                          "b": {"reuse_layer": "FF_0"}}}},

        "output": {"class": "softmax", "loss": "ce", "from": ["FF_0", "FF_1"]},
    }},
}

but i can't manage initialization of layers in a such case:

network = {
    "output": {"class": "rec", "from": "data", "unit": {
        "input": {"class": "copy", "from": ["prev:output", "data:source"]},

        'FF_0': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10},
        'FF_1': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10,
                 'reuse_params': {'map': {'W': {'reuse_layer': 'FF_0'}, 
                                          'b': {'reuse_layer': 'FF_0'}}}},

        "output": {"class": "softmax", "loss": "ce", "from": ["FF_0"]},
        "output1": {"class": "softmax", "loss": "ce", "from": ["FF_1"]}
    }},
}

I need the second output 'output1' to train a sub-decoder part. This is the config I'm using, the second decoder has prefix "iLMT_". /work/asr3/zeineldeen/hiwis/glushko/setups-data/switchboard/2021-06-21--ilmt-att-sis/work/crnn/training/CRNNTrainingJob.ACreMrgOrckx/output/crnn.config

albertz commented 3 years ago

... i can't manage initialization of layers in a such case ...

Why? What is the problem? Please post the full error including stack trace and log (with debug_print_layer_output_template enabled).

aleksglushko commented 3 years ago

This is the log file with backtrace and enabled debug_print_layer_output_template. backtrace_with_shapes.log

As i understand, the problem is that initializer somehow don't connect reuse_layer with it's parameters in the case when we have more than one'output' layer with loss. But i might be also wrong. Because as I see the ReuseParams object initialisation seems to be correct, since it has both the reuse_layer and map.

albertz commented 3 years ago

Where?

aleksglushko commented 3 years ago

Is it readable? Or should i just copy paste the log output here?

albertz commented 3 years ago

It's readable, but next time, please copy the relevant parts directly here.

For reference, here:

RETURNN starting up, version 1.20210701.004738+git.cb87521, date/time 2021-07-01-23-25-20 (UTC+0200), pid 2876, cwd /work/asr3/zeineldeen/hiwis/glushko/setups-data/switchboard/2021-06-21--ilmt-att-sis/work/crnn/training/CRNNTrainingJob.ACreMrgOrckx/work, Python /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/bin/python3
RETURNN command line options: ['/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/work/crnn/training/CRNNTrainingJob.ACreMrgOrckx/output/crnn.config']
Hostname: cluster-cn-253
2021-07-01 23:25:22.931917: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
[2021-07-01 23:25:24,437] INFO: Run time: 0:00:05 CPU: 0.40% RSS: 258MB VMS: 2.18GB
TensorFlow: 2.3.0 (v2.3.0-2-gee598066c4) (<site-package> in /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
2021-07-01 23:25:28.807619: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-01 23:25:28.837037: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2099815000 Hz
2021-07-01 23:25:28.837557: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x461d890 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-01 23:25:28.837596: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-01 23:25:28.842414: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
[2021-07-01 23:25:29,456] INFO: Run time: 0:00:10 CPU: 0.40% RSS: 347MB VMS: 2.99GB
2021-07-01 23:25:29.531428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-01 23:25:29.531479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      
CUDA_VISIBLE_DEVICES is set to '0'.
Collecting TensorFlow device list...
2021-07-01 23:25:29.739672: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x46bbb40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-01 23:25:29.739779: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2021-07-01 23:25:29.745034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:02:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-07-01 23:25:29.745150: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-07-01 23:25:29.749937: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-07-01 23:25:29.753337: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-07-01 23:25:29.754998: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-07-01 23:25:29.761654: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-07-01 23:25:29.788539: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-07-01 23:25:29.819354: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-07-01 23:25:29.832494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-01 23:25:29.832626: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
[2021-07-01 23:25:34,474] INFO: Run time: 0:00:15 CPU: 0.20% RSS: 705MB VMS: 17.18GB
2021-07-01 23:25:37.085265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-01 23:25:37.085323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2021-07-01 23:25:37.085332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2021-07-01 23:25:37.088809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 10266 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
Local devices available to TensorFlow:
  1/4: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 17821591032498612287
  2/4: name: "/device:XLA_CPU:0"
       device_type: "XLA_CPU"
       memory_limit: 17179869184
       locality {
       }
       incarnation: 7550349090206536510
       physical_device_desc: "device: XLA_CPU device"
  3/4: name: "/device:XLA_GPU:0"
       device_type: "XLA_GPU"
       memory_limit: 17179869184
       locality {
       }
       incarnation: 823384536220778428
       physical_device_desc: "device: XLA_GPU device"
  4/4: name: "/device:GPU:0"
       device_type: "GPU"
       memory_limit: 10764901440
       locality {
         bus_id: 1
         links {
         }
       }
       incarnation: 17495936249025657957
       physical_device_desc: "device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1"
Using gpu device 0: GeForce GTX 1080 Ti
<ExternSprintDataset 'dev' epoch=None>: epoch None exec ['/u/zhou/rasr-dev/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--config=/u/zeineldeen/setups/switchboard/2020-07-16--phon-att-sis/dependencies/rasr_configs/training.config', '--*.corpus.file=`cf /work/asr3/irie/data/switchboard/corpora/train.corpus.gz`', '--*.corpus.segments.file=`cf /u/zeineldeen/setups/switchboard/2020-01-21--att-phon/dependencies/seg_cv_head3000`', '--*.corpus.segment-order-shuffle=true', '--*.segment-order-sort-by-time-length=true', '--*.segment-order-sort-by-time-length-chunk-size=-1', '--*.feature-cache-path=`cf /u/tuske/work/ASR/switchboard/feature.extraction/gt40_40/data/gt.train.bundle`', '--*.log-channel.file=cv.sprint.log', '--*.window-size=1', '--*.seed=0', '--*.python-segment-order=true', '--*.python-segment-order-pymod-path=/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn', '--*.python-segment-order-pymod-name=returnn.sprint.extern_interface', '--*.use-data-source=false', '--*.trainer=python-trainer', '--*.pymod-path=/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn', '--*.pymod-name=returnn.sprint.extern_interface', '--*.pymod-config=action:ExternSprintDataset,c2p_fd:25,p2c_fd:26']
...
<ExternSprintDataset 'dev' epoch=None>: interrupt child proc 3526
<ExternSprintDataset 'dev' epoch=1>: epoch 1 exec ['/u/zhou/rasr-dev/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--config=/u/zeineldeen/setups/switchboard/2020-07-16--phon-att-sis/dependencies/rasr_configs/training.config', '--*.corpus.file=`cf /work/asr3/irie/data/switchboard/corpora/train.corpus.gz`', '--*.corpus.segments.file=`cf /u/zeineldeen/setups/switchboard/2020-01-21--att-phon/dependencies/seg_cv_head3000`', '--*.corpus.segment-order-shuffle=true', '--*.segment-order-sort-by-time-length=true', '--*.segment-order-sort-by-time-length-chunk-size=-1', '--*.feature-cache-path=`cf /u/tuske/work/ASR/switchboard/feature.extraction/gt40_40/data/gt.train.bundle`', '--*.log-channel.file=cv.sprint.log', '--*.window-size=1', '--*.seed=0', '--*.python-segment-order=true', '--*.python-segment-order-pymod-path=/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn', '--*.python-segment-order-pymod-name=returnn.sprint.extern_interface', '--*.use-data-source=false', '--*.trainer=python-trainer', '--*.pymod-path=/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn', '--*.pymod-name=returnn.sprint.extern_interface', '--*.pymod-config=action:ExternSprintDataset,c2p_fd:25,p2c_fd:26']

<ExternSprintDataset 'train' epoch=None>: epoch None exec ['/u/zhou/rasr-dev/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--config=/u/zeineldeen/setups/switchboard/2020-07-16--phon-att-sis/dependencies/rasr_configs/training.config', '--*.corpus.file=`cf /work/asr3/irie/data/switchboard/corpora/train.corpus.gz`', '--*.corpus.segments.file=`cf /u/tuske/work/ASR/switchboard/corpus/train.segments`', '--*.corpus.segment-order-shuffle=true', '--*.segment-order-sort-by-time-length=true', '--*.segment-order-sort-by-time-length-chunk-size=6000', '--*.feature-cache-path=`cf /u/tuske/work/ASR/switchboard/feature.extraction/gt40_40/data/gt.train.bundle`', '--*.log-channel.file=train.sprint.log', '--*.window-size=1', '--*.seed=0', '--*.corpus.partition=6', '--*.corpus.select-partition=0', '--*.python-segment-order=true', '--*.python-segment-order-pymod-path=/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn', '--*.python-segment-order-pymod-name=returnn.sprint.extern_interface', '--*.use-data-source=false', '--*.trainer=python-trainer', '--*.pymod-path=/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn', '--*.pymod-name=returnn.sprint.extern_interface', '--*.pymod-config=action:ExternSprintDataset,c2p_fd:26,p2c_fd:28']
...

Learning-rate-control: file learning_rates does not exist yet
Update config key 'batch_size' for epoch 1: 10000 -> 15000
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2021-07-01 23:27:06.373029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:02:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-07-01 23:27:06.373115: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-07-01 23:27:06.373156: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-07-01 23:27:06.373175: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-07-01 23:27:06.373192: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-07-01 23:27:06.373209: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-07-01 23:27:06.373225: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-07-01 23:27:06.373242: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-07-01 23:27:06.377129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-01 23:27:06.377177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-01 23:27:06.377186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2021-07-01 23:27:06.377192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2021-07-01 23:27:06.381401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10266 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
WARNING:tensorflow:From /u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py:435: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/util/basic.py:1285: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer root/'data' output: Data(name='data', batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
layer root/'source' output: Data(name='source_output', batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])
layer root/'source0' output: Data(name='source0_output', batch_shape_meta=[B,T|'time:var:extern_data:data',40,F|1])
layer root/'conv0' output: Data(name='conv0_output', batch_shape_meta=[B,T|'time:var:extern_data:data',40,F|32])
layer root/'conv0p' output: Data(name='conv0p_output', batch_shape_meta=[B,T|?,20,F|32])
layer root/'conv1' output: Data(name='conv1_output', batch_shape_meta=[B,T|'time:var:extern_data:data',20,F|32])
layer root/'conv1p' output: Data(name='conv1p_output', batch_shape_meta=[B,T|?,10,F|32])
layer root/'conv_merged' output: Data(name='conv_merged_output', batch_shape_meta=[B,T|'time:var:extern_data:data',F|320])
layer root/'lstm0_fw' output: Data(name='lstm0_fw_output', batch_shape_meta=[T|'time:var:extern_data:data',B,F|512])
<ExternSprintDataset 'devtrain' epoch=1> add_new_data: seq=200, len=128. Cache filled, waiting to get loaded...
OpCodeCompiler call: /usr/local/cuda-10.1/bin/nvcc -shared -O2 -std=c++11 -I /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow/include -I /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow/include/external/nsync/public -ccbin /usr/bin/gcc-5 -I /usr/local/cuda-10.1/include -L /usr/local/cuda-10.1/lib64 -x cu -v -DGOOGLE_CUDA=1 -Xcompiler -fPIC -Xcompiler -v -arch compute_61 -D_GLIBCXX_USE_CXX11_ABI=1 -DNDEBUG=1 -g /var/tmp/3426359.1.4-GPU-1080/glushko/returnn_tf_cache/ops/NativeLstm2/dbd9d53df5/NativeLstm2.cc -o /var/tmp/3426359.1.4-GPU-1080/glushko/returnn_tf_cache/ops/NativeLstm2/dbd9d53df5/NativeLstm2.so -L/work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/numpy.libs -l:libopenblasp-r0-34a18dc3.3.7.so -L/work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow -l:libtensorflow_framework.so.2
[2021-07-01 23:27:25,116] INFO: Run time: 0:02:05 CPU: 0.20% RSS: 3.12GB VMS: 21.66GB
[2021-07-01 23:27:30,137] INFO: Run time: 0:02:10 CPU: 0.60% RSS: 2.55GB VMS: 21.07GB
[2021-07-01 23:27:35,160] INFO: Run time: 0:02:15 CPU: 0.40% RSS: 3.11GB VMS: 21.63GB
OpCodeCompiler call: /usr/local/cuda-10.1/bin/nvcc -shared -O2 -std=c++11 -I /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow/include -I /work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow/include/external/nsync/public -ccbin /usr/bin/gcc-5 -I /usr/local/cuda-10.1/include -L /usr/local/cuda-10.1/lib64 -x cu -v -DGOOGLE_CUDA=1 -Xcompiler -fPIC -Xcompiler -v -arch compute_61 -D_GLIBCXX_USE_CXX11_ABI=1 -DNDEBUG=1 -g /var/tmp/3426359.1.4-GPU-1080/glushko/returnn_tf_cache/ops/GradOfNativeLstm2/e1228a5e61/GradOfNativeLstm2.cc -o /var/tmp/3426359.1.4-GPU-1080/glushko/returnn_tf_cache/ops/GradOfNativeLstm2/e1228a5e61/GradOfNativeLstm2.so -L/work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/numpy.libs -l:libopenblasp-r0-34a18dc3.3.7.so -L/work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow -l:libtensorflow_framework.so.2
[2021-07-01 23:27:55,231] INFO: Run time: 0:02:35 CPU: 0.20% RSS: 2.46GB VMS: 21.04GB
[2021-07-01 23:28:00,251] INFO: Run time: 0:02:40 CPU: 0.40% RSS: 3.06GB VMS: 21.58GB
[2021-07-01 23:28:10,287] INFO: Run time: 0:02:50 CPU: 0.40% RSS: 2.47GB VMS: 21.01GB
layer root/'lstm0_bw' output: Data(name='lstm0_bw_output', batch_shape_meta=[T|'time:var:extern_data:data',B,F|512])
layer root/'lstm0_pool' output: Data(name='lstm0_pool_output', batch_shape_meta=[B,T|?,F|1024])
layer root/'lstm1_fw' output: Data(name='lstm1_fw_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|512])
layer root/'lstm1_bw' output: Data(name='lstm1_bw_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|512])
layer root/'encoder' output: Data(name='encoder_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'data:bpe' output: Data(name='bpe', dtype='int32', sparse=True, dim=534, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:bpe'])
layer root/'ctc' output: Data(name='ctc_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|535])
layer root/'enc_value' output: Data(name='enc_value_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,1,F|1024])
layer root/'enc_ctx' output: Data(name='enc_ctx_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/'inv_fertility' output: Data(name='inv_fertility_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/'output' output: Data(name='output_output', dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])
Rec layer 'output' (search False, train 'globals/train_flag:0') sub net:
  Input layers moved out of loop: (#: 4)
    output
    target_embed
    prev_1_target_embed
    prev_2_target_embed
  Output layers moved out of loop: (#: 8)
    output_prob
    readout
    readout_in
    iLMT_0_output_prob
    iLMT_0_readout
    iLMT_0_readout_in
    iLMT_0_FF_0
    zero_att
  Layers in loop: (#: 9)
    FF_0
    att
    att0
    att_weights
    energy
    energy_tanh
    energy_in
    weight_feedback
    accum_att_weights
  Unused layers: (#: 1)
    end
WARNING:tensorflow:From /u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/util/basic.py:5317: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
layer root/output(rec-subnet-input)/'data:bpe' output: Data(name='bpe', dtype='int32', sparse=True, dim=534, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:bpe'])
layer root/output(rec-subnet-input)/'output' output: Data(name='output_output', dtype='int32', sparse=True, dim=534, batch_shape_meta=[B,T|'time:var:extern_data:bpe'])
layer root/output(rec-subnet-input)/'target_embed' output: Data(name='target_embed_output', batch_shape_meta=[B,T|'time:var:extern_data:bpe',F|621])
layer root/output(rec-subnet-input)/'prev:target_embed' output: Data(name='target_embed_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])
layer root/output(rec-subnet-input)/'prev_1_target_embed' output: Data(name='prev_1_target_embed_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])
layer root/output(rec-subnet-input)/'prev:prev_1_target_embed' output: Data(name='prev_1_target_embed_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])
layer root/output(rec-subnet-input)/'prev_2_target_embed' output: Data(name='prev_2_target_embed_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])
layer root/output(rec-subnet)/'prev:target_embed' output: Data(name='target_embed_output', batch_shape_meta=[B,F|621])
layer root/output(rec-subnet)/'prev:prev_1_target_embed' output: Data(name='prev_1_target_embed_output', batch_shape_meta=[B,F|621])
layer root/output(rec-subnet)/'prev:prev_2_target_embed' output: Data(name='prev_2_target_embed_output', batch_shape_meta=[B,F|621])
layer root/output(rec-subnet)/'FF_0' output: Data(name='FF_0_output', batch_shape_meta=[B,F|1024])
layer root/output(rec-subnet)/'weight_feedback' output: Data(name='weight_feedback_output', batch_shape_meta=[T|?,B,F|1024])
layer root/output(rec-subnet)/'energy_in' output: Data(name='energy_in_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/output(rec-subnet)/'energy_tanh' output: Data(name='energy_tanh_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024])
layer root/output(rec-subnet)/'energy' output: Data(name='energy_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/output(rec-subnet)/'att_weights' output: Data(name='att_weights_output', batch_shape_meta=[B,F|1,T|'spatial:0:lstm0_pool'])
layer root/output(rec-subnet)/'att0' output: Data(name='att0_output', batch_shape_meta=[B,1,F|1024])
layer root/output(rec-subnet)/'att' output: Data(name='att_output', batch_shape_meta=[B,F|1024])
layer root/output(rec-subnet)/'accum_att_weights' output: Data(name='accum_att_weights_output', batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1])
layer root/output(rec-subnet-output)/'FF_0' output: Data(name='FF_0_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])
layer root/output(rec-subnet-output)/'prev:target_embed' output: Data(name='target_embed_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])
layer root/output(rec-subnet-output)/'att' output: Data(name='att_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])
layer root/output(rec-subnet-output)/'readout_in' output: Data(name='readout_in_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1000])
layer root/output(rec-subnet-output)/'readout' output: Data(name='readout_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|500])
layer root/output(rec-subnet-output)/'data:bpe' output: Data(name='bpe', dtype='int32', sparse=True, dim=534, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:bpe'])
layer root/output(rec-subnet-output)/'output_prob' output: Data(name='output_prob_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|534])
layer root/output(rec-subnet-output)/'prev:prev_1_target_embed' output: Data(name='prev_1_target_embed_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])
layer root/output(rec-subnet-output)/'prev:prev_2_target_embed' output: Data(name='prev_2_target_embed_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])
layer root/output(rec-subnet-output)/'zero_att' output: Data(name='zero_att_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])
layer root/output(rec-subnet-output)/'iLMT_0_FF_0' output: Data(name='iLMT_0_FF_0_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])
Exception creating layer root/output(rec-subnet-output)/'iLMT_0_FF_0' of class LinearLayer with opts:
{'L2': 0.0005,
 '_name': 'iLMT_0_FF_0',
 '_network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'activation': 'tanh',
 'n_out': 1024,
 'name': 'iLMT_0_FF_0',
 'network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'output': Data(name='iLMT_0_FF_0_output', batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024]),
 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>}>,
 'sources': [<InternalLayer output/'prev:target_embed' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])>,
             <InternalLayer output/'prev:prev_1_target_embed' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])>,
             <InternalLayer output/'prev:prev_2_target_embed' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|621])>,
             <EvalLayer output/'zero_att' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>],
 'with_bias': True}
Exception occurred during output-net construction of layer 'iLMT_0_FF_0'.
We had previous exceptions at template construction, which got resolved, but maybe sth is wrong.
Template network (check out types / shapes):
{'FF_0': <_TemplateLayer(LinearLayer)(:template:linear) output/'FF_0' out_type=Data(batch_shape_meta=[B?,F|1024]) (construction stack 'energy_in')>,
 'accum_att_weights': <_TemplateLayer(EvalLayer)(:template:eval) output/'accum_att_weights' out_type=Data(batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1]) (construction stack 'weight_feedback')>,
 'att': <_TemplateLayer(MergeDimsLayer)(:template:merge_dims) output/'att' out_type=Data(batch_shape_meta=[B?,F|1024]) (construction stack 'zero_att')>,
 'att0': <_TemplateLayer(GenericAttentionLayer)(:template:generic_attention) output/'att0' out_type=Data(batch_shape_meta=[B?,1,F|1024]) (construction stack 'att')>,
 'att_weights': <_TemplateLayer(SoftmaxOverSpatialLayer)(:template:softmax_over_spatial) output/'att_weights' out_type=Data(batch_shape_meta=[B?,F|1,T|'spatial:0:lstm0_pool']) (construction stack 'att0')>,
 'data:bpe': <_TemplateLayer(SourceLayer)(:template:source) output/'data:bpe' out_type=Data(dtype='int32', sparse=True, dim=534, available_for_inference=False, batch_shape_meta=[B]) (construction stack 'output')>,
 'end': <_TemplateLayer(CompareLayer)(:template:compare) output/'end' out_type=Data(dtype='bool', sparse=True, dim=2, batch_shape_meta=[B]) (construction stack None)>,
 'energy': <_TemplateLayer(LinearLayer)(:template:linear) output/'energy' out_type=Data(batch_shape_meta=[T|'spatial:0:lstm0_pool',B?,F|1]) (construction stack 'att_weights')>,
 'energy_in': <_TemplateLayer(CombineLayer)(:template:combine) output/'energy_in' out_type=Data(batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024]) (construction stack 'energy_tanh')>,
 'energy_tanh': <_TemplateLayer(ActivationLayer)(:template:activation) output/'energy_tanh' out_type=Data(batch_shape_meta=[T|'spatial:0:lstm0_pool',B,F|1024]) (construction stack 'energy')>,
 'iLMT_0_FF_0': <_TemplateLayer(LinearLayer)(:template:linear) output/'iLMT_0_FF_0' out_type=Data(batch_shape_meta=[B?,F|1024]) (construction stack 'iLMT_0_readout_in')>,
 'iLMT_0_output_prob': <_TemplateLayer(SoftmaxLayer)(:template:softmax) output/'iLMT_0_output_prob' out_type=Data(batch_shape_meta=[B?,F|534]) (construction stack None)>,
 'iLMT_0_readout': <_TemplateLayer(ReduceOutLayer)(:template:reduce_out) output/'iLMT_0_readout' out_type=Data(batch_shape_meta=[B?,F|500]) (construction stack 'iLMT_0_output_prob')>,
 'iLMT_0_readout_in': <_TemplateLayer(LinearLayer)(:template:linear) output/'iLMT_0_readout_in' out_type=Data(batch_shape_meta=[B?,F|1000]) (construction stack 'iLMT_0_readout')>,
 'output': <_TemplateLayer(ChoiceLayer)(:template:choice) output/'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[B]) (construction stack None)>,
 'output_prob': <_TemplateLayer(SoftmaxLayer)(:template:softmax) output/'output_prob' out_type=Data(batch_shape_meta=[B?,F|534]) (construction stack None)>,
 'prev_1_target_embed': <_TemplateLayer(CopyLayer)(:template:copy) output/'prev_1_target_embed' out_type=Data(batch_shape_meta=[B?,F|621]) (construction stack 'iLMT_0_FF_0')>,
 'prev_2_target_embed': <_TemplateLayer(CopyLayer)(:template:copy) output/'prev_2_target_embed' out_type=Data(batch_shape_meta=[B?,F|621]) (construction stack 'iLMT_0_FF_0')>,
 'readout': <_TemplateLayer(ReduceOutLayer)(:template:reduce_out) output/'readout' out_type=Data(batch_shape_meta=[B?,F|500]) (construction stack 'output_prob')>,
 'readout_in': <_TemplateLayer(LinearLayer)(:template:linear) output/'readout_in' out_type=Data(batch_shape_meta=[B?,F|1000]) (construction stack 'readout')>,
 'target_embed': <_TemplateLayer(LinearLayer)(:template:linear) output/'target_embed' out_type=Data(batch_shape_meta=[B?,F|621]) (construction stack 'iLMT_0_FF_0')>,
 'weight_feedback': <_TemplateLayer(LinearLayer)(:template:linear) output/'weight_feedback' out_type=Data(batch_shape_meta=[T|'spatial:0:lstm0_pool',B?,F|1024]) (construction stack 'energy_in')>,
 'zero_att': <_TemplateLayer(EvalLayer)(:template:eval) output/'zero_att' out_type=Data(batch_shape_meta=[B?,F|1024]) (construction stack 'iLMT_0_FF_0')>}
Collected (unique) exceptions during template construction:
(Note that many of these can be ignored, or are expected.)
EXCEPTION while constructing layer 'accum_att_weights'
NetworkConstructionDependencyLoopException: <TFNetwork 'root/output(rec-subnet)' parent_net=<TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>: Error: There is a dependency loop on layer 'accum_att_weights'.
Construction stack (most recent first):
  accum_att_weights
  weight_feedback
  energy_in
  energy_tanh
  energy
  att_weights
  att0
  att
  zero_att
  iLMT_0_FF_0
  iLMT_0_readout_in
  iLMT_0_readout
  iLMT_0_output_prob

Exception occurred during output-net construction of layer 'iLMT_0_readout_in'.
Exception occurred during output-net construction of layer 'iLMT_0_readout'.
Exception occurred during output-net construction of layer 'iLMT_0_output_prob'.
Exception creating layer root/'output' of class RecLayer with opts:
{'_name': 'output',
 '_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 '_target_layers': {'bpe': <SourceLayer 'data:bpe' out_type=Data(dtype='int32', sparse=True, dim=534, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:bpe'])>},
 '_time_dim_tag': DimensionTag(kind='spatial', description='time:var:extern_data:bpe', id=23324237015024),
 'max_seq_len': <tf.Tensor 'max_seq_len_encoder:0' shape=() dtype=int32>,
 'n_out': <class 'returnn.util.basic.NotSpecified'>,
 'name': 'output',
 'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'output': Data(name='output_output', dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B]),
 'sources': [],
 'target': 'bpe',
 'unit': <_SubnetworkRecCell 'root/output(rec-subnet)'>}
Unhandled exception <class 'AssertionError'> in thread <_MainThread(MainThread, started 23325610391296)>, proc 2876.

...

EXCEPTION
Traceback (most recent call last):
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/rnn.py", line 11, in <module>
    line: main()
    locals:
      main = <local> <function main at 0x1536dcbee310>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/__main__.py", line 659, in main
    line: execute_main_task()
    locals:
      execute_main_task = <global> <function execute_main_task at 0x1536dcbee1f0>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/__main__.py", line 457, in execute_main_task
    line: engine.init_train_from_config(config, train_data, dev_data, eval_data)
    locals:
      engine = <global> <returnn.tf.engine.Engine object at 0x153698aa2ac0>
      engine.init_train_from_config = <global> <bound method Engine.init_train_from_config of <returnn.tf.engine.Engine object at 0x153698aa2ac0>>
      config = <global> <returnn.config.Config object at 0x1536ea9d3280>
      train_data = <global> <ExternSprintDataset 'train' epoch=1>
      dev_data = <global> <ExternSprintDataset 'dev' epoch=1>
      eval_data = <global> None
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/engine.py", line 1031, in Engine.init_train_from_config
    line: self.init_network_from_config(config)
    locals:
      self = <local> <returnn.tf.engine.Engine object at 0x153698aa2ac0>
      self.init_network_from_config = <local> <bound method Engine.init_network_from_config of <returnn.tf.engine.Engine object at 0x153698aa2ac0>>
      config = <local> <returnn.config.Config object at 0x1536ea9d3280>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/engine.py", line 1096, in Engine.init_network_from_config
    line: self._init_network(net_desc=net_dict, epoch=self.epoch)
    locals:
      self = <local> <returnn.tf.engine.Engine object at 0x153698aa2ac0>
      self._init_network = <local> <bound method Engine._init_network of <returnn.tf.engine.Engine object at 0x153698aa2ac0>>
      net_desc = <not found>
      net_dict = <local> {'conv0': {'L2': 0.0005, 'activation': None, 'class': 'conv', 'filter_size': (3, 3), 'from': 'source0', 'n_out': 32, 'padding': 'same', 'with_bias': True}, 'conv0p': {'class': 'pool', 'from': 'conv0', 'mode': 'max', 'padding': 'same', 'pool_size': (1, 2), 'trainable': False}, 'conv1': {'L2': 0.00..., len = 20
      epoch = <local> None
      self.epoch = <local> 1
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/engine.py", line 1275, in Engine._init_network
    line: self.network, self.updater = self.create_network(
            config=self.config,
            extern_data=extern_data,
            rnd_seed=net_random_seed,
            train_flag=train_flag, eval_flag=self.use_eval_flag, search_flag=self.use_search_flag,
            initial_learning_rate=getattr(self, "initial_learning_rate", None),
            net_dict=net_desc)
    locals:
      self = <local> <returnn.tf.engine.Engine object at 0x153698aa2ac0>
      self.network = <local> None
      self.updater = <local> None
      self.create_network = <local> <bound method Engine.create_network of <class 'returnn.tf.engine.Engine'>>
      config = <not found>
      self.config = <local> <returnn.config.Config object at 0x1536ea9d3280>
      extern_data = <local> <ExternData data={'bpe': Data(name='bpe', dtype='int32', sparse=True, dim=534, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:bpe']), 'data': Data(name='data', batch_shape_meta=[B,T|'time:var:extern_data:data',F|40])}>
      rnd_seed = <not found>
      net_random_seed = <local> 1
      train_flag = <local> <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>
      eval_flag = <not found>
      self.use_eval_flag = <local> True
      search_flag = <not found>
      self.use_search_flag = <local> False
      initial_learning_rate = <not found>
      getattr = <builtin> <built-in function getattr>
      net_dict = <not found>
      net_desc = <local> {'conv0': {'L2': 0.0005, 'activation': None, 'class': 'conv', 'filter_size': (3, 3), 'from': 'source0', 'n_out': 32, 'padding': 'same', 'with_bias': True}, 'conv0p': {'class': 'pool', 'from': 'conv0', 'mode': 'max', 'padding': 'same', 'pool_size': (1, 2), 'trainable': False}, 'conv1': {'L2': 0.00..., len = 20
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/engine.py", line 1316, in Engine.create_network
    line: network.construct_from_dict(net_dict)
    locals:
      network = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      network.construct_from_dict = <local> <bound method TFNetwork.construct_from_dict of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'conv0': {'L2': 0.0005, 'activation': None, 'class': 'conv', 'filter_size': (3, 3), 'from': 'source0', 'n_out': 32, 'padding': 'same', 'with_bias': True}, 'conv0p': {'class': 'pool', 'from': 'conv0', 'mode': 'max', 'padding': 'same', 'pool_size': (1, 2), 'trainable': False}, 'conv1': {'L2': 0.00..., len = 20
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 564, in TFNetwork.construct_from_dict
    line: self.construct_layer(net_dict, name, get_layer=get_layer)
    locals:
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'conv0': {'L2': 0.0005, 'activation': None, 'class': 'conv', 'filter_size': (3, 3), 'from': 'source0', 'n_out': 32, 'padding': 'same', 'with_bias': True}, 'conv0p': {'class': 'pool', 'from': 'conv0', 'mode': 'max', 'padding': 'same', 'pool_size': (1, 2), 'trainable': False}, 'conv1': {'L2': 0.00..., len = 20
      name = <local> 'decision', len = 8
      get_layer = <local> None
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 883, in TFNetwork.construct_layer
    line: layer_class.transform_config_dict(layer_desc, network=net, get_layer=get_layer)
    locals:
      layer_class = <local> <class 'returnn.tf.layers.rec.DecideLayer'>
      layer_class.transform_config_dict = <local> <bound method BaseChoiceLayer.transform_config_dict of <class 'returnn.tf.layers.rec.DecideLayer'>>
      layer_desc = <local> {'loss': 'edit_distance', 'target': 'bpe', '_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'decision'}
      network = <not found>
      net = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x153698bfaf70>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 4375, in BaseChoiceLayer.transform_config_dict
    line: super(BaseChoiceLayer, cls).transform_config_dict(d, network=network, get_layer=get_layer)
    locals:
      super = <builtin> <class 'super'>
      BaseChoiceLayer = <global> <class 'returnn.tf.layers.rec.BaseChoiceLayer'>
      cls = <local> <class 'returnn.tf.layers.rec.DecideLayer'>
      transform_config_dict = <not found>
      d = <local> {'loss': 'edit_distance', 'target': 'bpe', '_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'decision'}
      network = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x153698bfaf70>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 466, in LayerBase.transform_config_dict
    line: d["sources"] = [
            get_layer(src_name)
            for src_name in src_names
            if not src_name == "none"]
    locals:
      d = <local> {'loss': 'edit_distance', 'target': 'bpe', '_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'decision'}
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x153698bfaf70>
      src_name = <not found>
      src_names = <local> ['output'], _[0]: {len = 6}
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 467, in <listcomp>
    line: get_layer(src_name)
    locals:
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x153698bfaf70>
      src_name = <local> 'output', len = 6
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 770, in TFNetwork.construct_layer.<locals>.get_layer
    line: return self.construct_layer(net_dict=net_dict, name=src_name, get_layer=get_layer, add_layer=add_layer)
    locals:
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'conv0': {'L2': 0.0005, 'activation': None, 'class': 'conv', 'filter_size': (3, 3), 'from': 'source0', 'n_out': 32, 'padding': 'same', 'with_bias': True}, 'conv0p': {'class': 'pool', 'from': 'conv0', 'mode': 'max', 'padding': 'same', 'pool_size': (1, 2), 'trainable': False}, 'conv1': {'L2': 0.00..., len = 20
      name = <not found>
      src_name = <local> 'output', len = 6
      get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x153698bfaf70>
      add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 890, in TFNetwork.construct_layer
    line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'output', len = 6
      name_with_prefix = <local> 'output', len = 6
      layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
      layer_desc = <local> {'max_seq_len': <tf.Tensor 'max_seq_len_encoder:0' shape=() dtype=int32>, 'target': 'bpe', '_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'output', 'n_out': <class 'returnn.util.basic.NotSpecified'>, 'sources': [], '_target_layers': {'bpe': <..., len = 9
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 1045, in TFNetwork.add_layer
    line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      layer = <not found>
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'output', len = 6
      layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
      layer_desc = <local> {'max_seq_len': <tf.Tensor 'max_seq_len_encoder:0' shape=() dtype=int32>, 'target': 'bpe', '_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'output', 'n_out': <class 'returnn.util.basic.NotSpecified'>, 'sources': [], '_target_layers': {'bpe': <..., len = 9
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 967, in TFNetwork._create_layer
    line: layer = layer_class(**layer_desc)
    locals:
      layer = <not found>
      layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
      layer_desc = <local> {'max_seq_len': <tf.Tensor 'max_seq_len_encoder:0' shape=() dtype=int32>, 'target': 'bpe', '_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'output', 'n_out': <class 'returnn.util.basic.NotSpecified'>, 'sources': [], '_target_layers': {'bpe': <..., len = 12
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 236, in RecLayer.__init__
    line: y = self._get_output_subnet_unit(self.cell)
    locals:
      y = <not found>
      self = <local> <RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])>
      self._get_output_subnet_unit = <local> <bound method RecLayer._get_output_subnet_unit of <RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])>>
      self.cell = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 916, in RecLayer._get_output_subnet_unit
    line: output = cell.get_output()
    locals:
      output = <not found>
      cell = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      cell.get_output = <local> <bound method _SubnetworkRecCell.get_output of <_SubnetworkRecCell 'root/output(rec-subnet)'>>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 2488, in _SubnetworkRecCell.get_output
    line: self._construct_output_layers_moved_out(
            loop_accumulated=self.final_acc_tas_dict, seq_len=seq_len,
            extra_output_layers=extra_output_layers, final_net_vars=final_net_vars)
    locals:
      self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      self._construct_output_layers_moved_out = <local> <bound method _SubnetworkRecCell._construct_output_layers_moved_out of <_SubnetworkRecCell 'root/output(rec-subnet)'>>
      loop_accumulated = <not found>
      self.final_acc_tas_dict = <local> {'output_FF_0': <tf.TensorArray 'output/rec/subnet_base/acc_ta_output_FF_0'>, 'output_att': <tf.TensorArray 'output/rec/subnet_base/acc_ta_output_att'>}
      seq_len = <local> <tf.Tensor 'output/rec/subnet_base/check_seq_len_batch_size/check_input_dim/identity_with_dim_check:0' shape=(?,) dtype=int32>
      extra_output_layers = <local> {'output'}, len = 1
      final_net_vars = <local> ([<tf.Tensor 'output/rec/while/Exit_1:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'output/rec/while/Exit_2:0' shape=(?, 1024) dtype=float32>], [])
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 3266, in _SubnetworkRecCell._construct_output_layers_moved_out
    line: get_layer(layer_name)
    locals:
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
      layer_name = <local> 'iLMT_0_output_prob', len = 18
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 3254, in _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer
    line: return self.output_layers_net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
    locals:
      self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      self.output_layers_net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.output_layers_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      self.net_dict = <local> {'FF_0': {'L2': 0.0005, 'activation': 'tanh', 'class': 'linear', 'from': ['prev:target_embed', 'prev:prev_1_target_embed', 'prev:prev_2_target_embed', 'prev:att'], 'n_out': 1024, 'with_bias': True}, 'accum_att_weights': {'class': 'eval', 'eval': 'source(0) + source(1) * source(2) * 0.5', 'from': ..., len = 22
      name = <local> 'iLMT_0_output_prob', len = 18
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 883, in TFNetwork.construct_layer
    line: layer_class.transform_config_dict(layer_desc, network=net, get_layer=get_layer)
    locals:
      layer_class = <local> <class 'returnn.tf.layers.basic.SoftmaxLayer'>
      layer_class.transform_config_dict = <local> <bound method LayerBase.transform_config_dict of <class 'returnn.tf.layers.basic.SoftmaxLayer'>>
      layer_desc = <local> {'L2': 0.0005, 'dropout': 0.3, 'loss': 'ce', 'loss_opts': {'label_smoothing': 0.1, 'scale': 1.0}, 'target': 'bpe', '_network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:b..., len = 7
      network = <not found>
      net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 466, in LayerBase.transform_config_dict
    line: d["sources"] = [
            get_layer(src_name)
            for src_name in src_names
            if not src_name == "none"]
    locals:
      d = <local> {'L2': 0.0005, 'dropout': 0.3, 'loss': 'ce', 'loss_opts': {'label_smoothing': 0.1, 'scale': 1.0}, 'target': 'bpe', '_network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:b..., len = 7
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
      src_name = <not found>
      src_names = <local> ['iLMT_0_readout'], _[0]: {len = 14}
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 467, in <listcomp>
    line: get_layer(src_name)
    locals:
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
      src_name = <local> 'iLMT_0_readout', len = 14
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 3254, in _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer
    line: return self.output_layers_net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
    locals:
      self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      self.output_layers_net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.output_layers_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      self.net_dict = <local> {'FF_0': {'L2': 0.0005, 'activation': 'tanh', 'class': 'linear', 'from': ['prev:target_embed', 'prev:prev_1_target_embed', 'prev:prev_2_target_embed', 'prev:att'], 'n_out': 1024, 'with_bias': True}, 'accum_att_weights': {'class': 'eval', 'eval': 'source(0) + source(1) * source(2) * 0.5', 'from': ..., len = 22
      name = <local> 'iLMT_0_readout', len = 14
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 883, in TFNetwork.construct_layer
    line: layer_class.transform_config_dict(layer_desc, network=net, get_layer=get_layer)
    locals:
      layer_class = <local> <class 'returnn.tf.layers.basic.ReduceOutLayer'>
      layer_class.transform_config_dict = <local> <bound method LayerBase.transform_config_dict of <class 'returnn.tf.layers.basic.ReduceOutLayer'>>
      layer_desc = <local> {'mode': 'max', 'num_pieces': 2, '_network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': '...
      network = <not found>
      net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 466, in LayerBase.transform_config_dict
    line: d["sources"] = [
            get_layer(src_name)
            for src_name in src_names
            if not src_name == "none"]
    locals:
      d = <local> {'mode': 'max', 'num_pieces': 2, '_network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': '...
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
      src_name = <not found>
      src_names = <local> ['iLMT_0_readout_in'], _[0]: {len = 17}
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 467, in <listcomp>
    line: get_layer(src_name)
    locals:
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
      src_name = <local> 'iLMT_0_readout_in', len = 17
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 3254, in _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer
    line: return self.output_layers_net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
    locals:
      self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      self.output_layers_net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.output_layers_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      self.net_dict = <local> {'FF_0': {'L2': 0.0005, 'activation': 'tanh', 'class': 'linear', 'from': ['prev:target_embed', 'prev:prev_1_target_embed', 'prev:prev_2_target_embed', 'prev:att'], 'n_out': 1024, 'with_bias': True}, 'accum_att_weights': {'class': 'eval', 'eval': 'source(0) + source(1) * source(2) * 0.5', 'from': ..., len = 22
      name = <local> 'iLMT_0_readout_in', len = 17
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 883, in TFNetwork.construct_layer
    line: layer_class.transform_config_dict(layer_desc, network=net, get_layer=get_layer)
    locals:
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_class.transform_config_dict = <local> <bound method LayerBase.transform_config_dict of <class 'returnn.tf.layers.basic.LinearLayer'>>
      layer_desc = <local> {'activation': None, 'n_out': 1000, 'with_bias': True, '_network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dt...
      network = <not found>
      net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 466, in LayerBase.transform_config_dict
    line: d["sources"] = [
            get_layer(src_name)
            for src_name in src_names
            if not src_name == "none"]
    locals:
      d = <local> {'activation': None, 'n_out': 1000, 'with_bias': True, '_network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dt...
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
      src_name = <not found>
      src_names = <local> ['iLMT_0_FF_0', 'prev:target_embed', 'zero_att'], _[0]: {len = 11}
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 467, in <listcomp>
    line: get_layer(src_name)
    locals:
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
      src_name = <local> 'iLMT_0_FF_0', len = 11
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 3254, in _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer
    line: return self.output_layers_net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
    locals:
      self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      self.output_layers_net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.output_layers_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      self.net_dict = <local> {'FF_0': {'L2': 0.0005, 'activation': 'tanh', 'class': 'linear', 'from': ['prev:target_embed', 'prev:prev_1_target_embed', 'prev:prev_2_target_embed', 'prev:att'], 'n_out': 1024, 'with_bias': True}, 'accum_att_weights': {'class': 'eval', 'eval': 'source(0) + source(1) * source(2) * 0.5', 'from': ..., len = 22
      name = <local> 'iLMT_0_FF_0', len = 11
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x1536980fe430>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 890, in TFNetwork.construct_layer
    line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'iLMT_0_FF_0', len = 11
      name_with_prefix = <local> 'iLMT_0_FF_0', len = 11
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_desc = <local> {'L2': 0.0005, 'activation': 'tanh', 'n_out': 1024, 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer outp..., len = 8
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 1045, in TFNetwork.add_layer
    line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      layer = <not found>
      self = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(dtype='int32', sparse=True, dim=534, batch_shape_meta=[T|'time:var:extern_data:bpe',B])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'iLMT_0_FF_0', len = 11
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_desc = <local> {'L2': 0.0005, 'activation': 'tanh', 'n_out': 1024, 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer outp..., len = 8
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 967, in TFNetwork._create_layer
    line: layer = layer_class(**layer_desc)
    locals:
      layer = <not found>
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_desc = <local> {'L2': 0.0005, 'activation': 'tanh', 'n_out': 1024, 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer outp..., len = 11
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/basic.py", line 1456, in LinearLayer.__init__
    line: weights = self.add_param(tf_compat.v1.get_variable(
            name="W", shape=weights_shape, dtype=tf.float32, initializer=fwd_weights_initializer))
    locals:
      weights = <not found>
      self = <local> <LinearLayer output/'iLMT_0_FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>
      self.add_param = <local> <bound method LayerBase.add_param of <LinearLayer output/'iLMT_0_FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>>
      tf_compat = <global> <module 'returnn.tf.compat' from '/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/compat.py'>
      tf_compat.v1 = <global> <module 'tensorflow._api.v2.compat.v1' from '/work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow/_api/v2/compat/v1/__init__.py'>
      tf_compat.v1.get_variable = <global> <function get_variable at 0x1536ba483550>
      name = <not found>
      shape = <not found>
      weights_shape = <local> (2887, 1024)
      dtype = <not found>
      tf = <global> <module 'tensorflow' from '/work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow/__init__.py'>
      tf.float32 = <global> tf.float32
      initializer = <not found>
      fwd_weights_initializer = <local> <tensorflow.python.ops.init_ops.GlorotUniform object at 0x153698086070>
  File "/work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 1556, in get_variable
    line: return get_variable_scope().get_variable(
              _get_default_variable_store(),
              name,
              shape=shape,
              dtype=dtype,
              initializer=initializer,
              regularizer=regularizer,
              trainable=trainable,
              collections=collections,
              caching_device=caching_device,
              partitioner=partitioner,
              validate_shape=validate_shape,
              use_resource=use_resource,
              custom_getter=custom_getter,
              constraint=constraint,
              synchronization=synchronization,
              aggregation=aggregation)
    locals:
      get_variable_scope = <global> <function get_variable_scope at 0x1536ba483310>
      get_variable = <global> <function get_variable at 0x1536ba483550>
      _get_default_variable_store = <global> <function _get_default_variable_store at 0x1536ba4833a0>
      name = <local> 'W'
      shape = <local> (2887, 1024)
      dtype = <local> tf.float32
      initializer = <local> <tensorflow.python.ops.init_ops.GlorotUniform object at 0x153698086070>
      regularizer = <local> None
      trainable = <local> None
      collections = <local> None
      caching_device = <local> None
      partitioner = <local> None
      validate_shape = <local> True
      use_resource = <local> None
      custom_getter = <local> None
      constraint = <local> None
      synchronization = <local> <VariableSynchronization.AUTO: 0>
      aggregation = <local> <VariableAggregation.NONE: 0>
  File "/work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 1299, in VariableScope.get_variable
    line: return var_store.get_variable(
              full_name,
              shape=shape,
              dtype=dtype,
              initializer=initializer,
              regularizer=regularizer,
              reuse=reuse,
              trainable=trainable,
              collections=collections,
              caching_device=caching_device,
              partitioner=partitioner,
              validate_shape=validate_shape,
              use_resource=use_resource,
              custom_getter=custom_getter,
              constraint=constraint,
              synchronization=synchronization,
              aggregation=aggregation)
    locals:
      var_store = <local> <tensorflow.python.ops.variable_scope._VariableStore object at 0x153698dcaa60>
      var_store.get_variable = <local> <bound method _VariableStore.get_variable of <tensorflow.python.ops.variable_scope._VariableStore object at 0x153698dcaa60>>
      full_name = <local> 'output/rec/iLMT_0_FF_0/W', len = 24
      shape = <local> (2887, 1024)
      dtype = <local> tf.float32
      initializer = <local> <tensorflow.python.ops.init_ops.GlorotUniform object at 0x153698086070>
      regularizer = <local> None
      reuse = <local> <_ReuseMode.AUTO_REUSE: 1>
      trainable = <local> None
      collections = <local> None
      caching_device = <local> None
      partitioner = <local> None
      validate_shape = <local> True
      use_resource = <local> None
      custom_getter = <local> <function ReuseParams.get_variable_scope.<locals>._variable_custom_getter at 0x15369807c5e0>
      constraint = <local> None
      synchronization = <local> <VariableSynchronization.AUTO: 0>
      aggregation = <local> <VariableAggregation.NONE: 0>
  File "/work/tools/asr/python/3.8.0_tf_2.3-v1-generic+cuda10.1/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 552, in _VariableStore.get_variable
    line: return custom_getter(**custom_getter_kwargs)
    locals:
      custom_getter = <local> <function ReuseParams.get_variable_scope.<locals>._variable_custom_getter at 0x15369807c5e0>
      custom_getter_kwargs = <local> {'getter': <function _VariableStore.get_variable.<locals>._true_getter at 0x15369807c670>, 'name': 'output/rec/iLMT_0_FF_0/W', 'shape': (2887, 1024), 'dtype': tf.float32, 'initializer': <tensorflow.python.ops.init_ops.GlorotUniform object at 0x153698086070>, 'regularizer': None, 'reuse': <_ReuseM..., len = 16
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 1801, in ReuseParams.get_variable_scope.<locals>._variable_custom_getter
    line: return self.variable_custom_getter(base_layer=base_layer, **kwargs_)
    locals:
      self = <local> <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bp...
      self.variable_custom_getter = <local> <bound method ReuseParams.variable_custom_getter of <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_ty...
      base_layer = <local> <LinearLayer output/'iLMT_0_FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>
      kwargs_ = <local> {'getter': <function _VariableStore.get_variable.<locals>._true_getter at 0x15369807c670>, 'name': 'output/rec/iLMT_0_FF_0/W', 'shape': (2887, 1024), 'dtype': tf.float32, 'initializer': <tensorflow.python.ops.init_ops.GlorotUniform object at 0x153698086070>, 'regularizer': None, 'reuse': <_ReuseM..., len = 16
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 1839, in ReuseParams.variable_custom_getter
    line: return self.param_map[param_name].variable_custom_getter(
            getter=getter, name=name, base_layer=base_layer, **kwargs)
    locals:
      self = <local> <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bp...
      self.param_map = <local> {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>}
      param_name = <local> 'W'
      variable_custom_getter = <not found>
      getter = <local> <function _VariableStore.get_variable.<locals>._true_getter at 0x15369807c670>
      name = <local> 'output/rec/iLMT_0_FF_0/W', len = 24
      base_layer = <local> <LinearLayer output/'iLMT_0_FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>
      kwargs = <local> {'shape': (2887, 1024), 'dtype': tf.float32, 'initializer': <tensorflow.python.ops.init_ops.GlorotUniform object at 0x153698086070>, 'regularizer': None, 'reuse': <_ReuseMode.AUTO_REUSE: 1>, 'trainable': True, 'collections': None, 'caching_device': None, 'partitioner': None, 'validate_shape': Tru..., len = 14
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 1843, in ReuseParams.variable_custom_getter
    line: assert param_name in self.reuse_layer.params
    locals:
      param_name = <local> 'W'
      self = <local> <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>, map None>
      self.reuse_layer = <local> <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:bpe',B,F|1024])>
      self.reuse_layer.params = <local> {}
AssertionError
albertz commented 3 years ago

Your error/log does not directly correspond to the example net you posted initially. Can you post the error you get exactly for your example?

albertz commented 3 years ago

Regarding the error itself: From a first glance, maybe this is because the layer FF_0 is inside the loop and the reuse params logic when it accesses the layer tries to get it from outside.

Actually, the whole code of ReuseParams is quite fragile and partly hacky. I'm not really sure that we really need to access the other layer (reuse_layer) at all here. We should be able to just use the name scope, i.e. to infer the name scope and that way access the variable.

This would need some cleanup and reimplementation. Unfortunately I don't have much time currently to do that. PRs are welcome. But I'm a bit afraid that this needs some in depth knowledge about RETURNN, and someone should help here, or at least overlook it. @patrick-wilken maybe?

Maybe you are also fine with some workaround for now. Regarding some workaround: You could just set some custom getter function. Sth like this (untested, but I hope you get the idea):

def get_var(name):
  with reuse_name_scope("", absolute=True):
    return tf.get_variable(name)
...
                 'reuse_params': {'map': {'W': {'custom': lambda **_kwargs: get_var("output/FF_0/W")}, 
                                          'b': {'custom': lambda **_kwargs: get_var("output/FF_0/b")}}}},
aleksglushko commented 3 years ago

I will post the logs from the upper examples later, since im getting some unrelated error for the second case that didn't appear before. For the first case reuse_params is working like for the demos in test_TFNetworkLayer.py. For the second one, i have got the same error like in the upper log, but need to reproduce it.

Thank you for the idea with an explicit function, I have also seen in the slack that one can use prefix extra.any_name:crf to share the weights, but didn't find any annotation about it's usage in the returnn docs-website

aleksglushko commented 3 years ago

This is the log for second case with two outputs

Train data:
  input: 9 x 1
  output: {'classes': (2, 1), 'data': (9, 2)}
  Task12AXDataset, sequences: 1000, frames: unknown
Dev data:
  Task12AXDataset, sequences: 100, frames: unknown
Device not set explicitly, and we found a GPU, which we will use.
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
layer root/'data' output: Data(name='data', batch_shape_meta=[B,T|'time:var:extern_data:data',F|9])
layer root/'output' output: Data(name='output_output', batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])
Rec layer 'output' (search False, train 'globals/train_flag:0') sub net:
  Input layers moved out of loop: (#: 0)
    None
  Output layers moved out of loop: (#: 2)
    output1
    FF_1
  Layers in loop: (#: 3)
    output
    FF_0
    input
  Unused layers: (#: 0)
    None
layer root/output(rec-subnet)/'data:source' output: Data(name='data', batch_shape_meta=[B,F|9])
layer root/output(rec-subnet)/'input' output: Data(name='input_output', batch_shape_meta=[B,F|11])
layer root/output(rec-subnet)/'FF_0' output: Data(name='FF_0_output', batch_shape_meta=[B,F|10])
Exception occurred during in-loop construction of layer 'classes'.
We had previous exceptions at template construction, which got resolved, but maybe sth is wrong.
Template network (check out types / shapes):
{'FF_0': <_TemplateLayer(LinearLayer)(:template:linear) output/'FF_0' out_type=Data(batch_shape_meta=[B?,F|10]) (construction stack 'output')>,
 'FF_1': <_TemplateLayer(LinearLayer)(:template:linear) output/'FF_1' out_type=Data(batch_shape_meta=[B?,F|10]) (construction stack 'output1')>,
 'data:classes': <_TemplateLayer(SourceLayer)(:template:source) output/'data:classes' out_type=Data(dtype='int32', sparse=True, dim=2, available_for_inference=False, batch_shape_meta=[B]) (construction stack 'output')>,
 'data:source': <_TemplateLayer(SourceLayer)(:template:source) output/'data:source' out_type=Data(batch_shape_meta=[B,F|9]) (construction stack 'input')>,
 'input': <_TemplateLayer(CopyLayer)(:template:copy) output/'input' out_type=Data(batch_shape_meta=[B,F|11]) (construction stack 'FF_0')>,
 'output': <_TemplateLayer(SoftmaxLayer)(:template:softmax) output/'output' out_type=Data(batch_shape_meta=[B?,F|2]) (construction stack None)>,
 'output1': <_TemplateLayer(SoftmaxLayer)(:template:softmax) output/'output1' out_type=Data(batch_shape_meta=[B?,F|2]) (construction stack None)>}
Collected (unique) exceptions during template construction:
(Note that many of these can be ignored, or are expected.)
EXCEPTION while constructing layer 'input'
NetworkConstructionDependencyLoopException: <TFNetwork 'root/output(rec-subnet)' parent_net=<TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>: Error: There is a dependency loop on layer 'output'.
Construction stack (most recent first):
  input
  FF_0
  output

layer root/output(rec-subnet)/'data:classes' output: Data(name='classes', dtype='int32', sparse=True, dim=2, available_for_inference=False, batch_shape_meta=[B])
layer root/output(rec-subnet)/'output' output: Data(name='output_output', batch_shape_meta=[B,F|2])
layer root/output(rec-subnet-output)/'input' output: Data(name='input_output', batch_shape_meta=[T|'time:var:extern_data:data',B,F|11])
layer root/output(rec-subnet-output)/'FF_0' output: Data(name='FF_0_output', batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])
layer root/output(rec-subnet-output)/'FF_1' output: Data(name='FF_1_output', batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])
Exception creating layer root/output(rec-subnet-output)/'FF_1' of class LinearLayer with opts:
{'_name': 'FF_1',
 '_network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'activation': 'tanh',
 'n_out': 10,
 'name': 'FF_1',
 'network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'output': Data(name='FF_1_output', batch_shape_meta=[T|'time:var:extern_data:data',B,F|10]),
 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>}>,
 'sources': [<InternalLayer output/'input' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|11])>]}
Exception occurred during output-net construction of layer 'FF_1'.
Exception occurred during output-net construction of layer 'output1'.
Exception creating layer root/'output' of class RecLayer with opts:
{'_name': 'output',
 '_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 '_time_dim_tag': DimensionTag(kind='spatial', description='time:var:extern_data:data', id=140167542826432),
 'n_out': <class 'returnn.util.basic.NotSpecified'>,
 'name': 'output',
 'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'output': Data(name='output_output', batch_shape_meta=[T|'time:var:extern_data:data',B,F|2]),
 'sources': [<SourceLayer 'data' out_type=Data(batch_shape_meta=[B,T|'time:var:extern_data:data',F|9])>],
 'unit': <_SubnetworkRecCell 'root/output(rec-subnet)'>}
Unhandled exception <class 'AssertionError'> in thread <_MainThread(MainThread, started 140169104250624)>, proc 28292.

Thread current, main, <_MainThread(MainThread, started 140169104250624)>:
(Excluded thread.)

That were all threads.
EXCEPTION
Traceback (most recent call last):
  File "returnn/rnn.py", line 11, in <module>
    line: main()
    locals:
      main = <local> <function main at 0x7f7b9bf03ae8>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/__main__.py", line 659, in main
    line: execute_main_task()
    locals:
      execute_main_task = <global> <function execute_main_task at 0x7f7b9bf039d8>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/__main__.py", line 457, in execute_main_task
    line: engine.init_train_from_config(config, train_data, dev_data, eval_data)
    locals:
      engine = <global> <returnn.tf.engine.Engine object at 0x7f7ba0c28978>
      engine.init_train_from_config = <global> <bound method Engine.init_train_from_config of <returnn.tf.engine.Engine object at 0x7f7ba0c28978>>
      config = <global> <returnn.config.Config object at 0x7f7ba9940358>
      train_data = <global> <Task12AXDataset 'train' epoch=None>
      dev_data = <global> <Task12AXDataset 'dev' epoch=None>
      eval_data = <global> None
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/engine.py", line 1031, in Engine.init_train_from_config
    line: self.init_network_from_config(config)
    locals:
      self = <local> <returnn.tf.engine.Engine object at 0x7f7ba0c28978>
      self.init_network_from_config = <local> <bound method Engine.init_network_from_config of <returnn.tf.engine.Engine object at 0x7f7ba0c28978>>
      config = <local> <returnn.config.Config object at 0x7f7ba9940358>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/engine.py", line 1096, in Engine.init_network_from_config
    line: self._init_network(net_desc=net_dict, epoch=self.epoch)
    locals:
      self = <local> <returnn.tf.engine.Engine object at 0x7f7ba0c28978>
      self._init_network = <local> <bound method Engine._init_network of <returnn.tf.engine.Engine object at 0x7f7ba0c28978>>
      net_desc = <not found>
      net_dict = <local> {'output': {'class': 'rec', 'from': 'data', 'unit': {'input': {'class': 'copy', 'from': ['prev:output', 'data:source']}, 'FF_0': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10}, 'FF_1': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10, 'reuse_para...
      epoch = <local> None
      self.epoch = <local> 1
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/engine.py", line 1281, in Engine._init_network
    line: self.network, self.updater = self.create_network(
            config=self.config,
            extern_data=extern_data,
            rnd_seed=net_random_seed,
            train_flag=train_flag, eval_flag=self.use_eval_flag, search_flag=self.use_search_flag,
            initial_learning_rate=getattr(self, "initial_learning_rate", None),
            net_dict=net_desc)
    locals:
      self = <local> <returnn.tf.engine.Engine object at 0x7f7ba0c28978>
      self.network = <local> None
      self.updater = <local> None
      self.create_network = <local> <bound method Engine.create_network of <class 'returnn.tf.engine.Engine'>>
      config = <not found>
      self.config = <local> <returnn.config.Config object at 0x7f7ba9940358>
      extern_data = <local> <ExternData data={'classes': Data(name='classes', dtype='int32', sparse=True, dim=2, available_for_inference=False, batch_shape_meta=[B,T|'time:var:extern_data:classes']), 'dat
a': Data(name='data', batch_shape_meta=[B,T|'time:var:extern_data:data',F|9])}>
      rnd_seed = <not found>
      net_random_seed = <local> 1
      train_flag = <local> <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>
      eval_flag = <not found>
      self.use_eval_flag = <local> True
      search_flag = <not found>
      self.use_search_flag = <local> False
      initial_learning_rate = <not found>
      getattr = <builtin> <built-in function getattr>
      net_dict = <not found>
      net_desc = <local> {'output': {'class': 'rec', 'from': 'data', 'unit': {'input': {'class': 'copy', 'from': ['prev:output', 'data:source']}, 'FF_0': {'activation': 'tanh', 'class': 'linear', 'from': 
['input'], 'n_out': 10}, 'FF_1': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10, 'reuse_para...
File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/engine.py", line 1316, in Engine.create_network
    line: network.construct_from_dict(net_dict)
    locals:
      network = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      network.construct_from_dict = <local> <bound method TFNetwork.construct_from_dict of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'output': {'class': 'rec', 'from': 'data', 'unit': {'input': {'class': 'copy', 'from': ['prev:output', 'data:source']}, 'FF_0': {'activation': 'tanh', 'class': 'linear', 'from': 
['input'], 'n_out': 10}, 'FF_1': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10, 'reuse_para...
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 564, in TFNetwork.construct_from_dict
    line: self.construct_layer(net_dict, name, get_layer=get_layer)
    locals:
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      net_dict = <local> {'output': {'class': 'rec', 'from': 'data', 'unit': {'input': {'class': 'copy', 'from': ['prev:output', 'data:source']}, 'FF_0': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10}, 'FF_1': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10, 'reuse_para...
      name = <local> 'output', len = 6
      get_layer = <local> None
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 890, in TFNetwork.construct_layer
    line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'output', len = 6
      name_with_prefix = <local> 'output', len = 6
      layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
      layer_desc = <local> {'_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'output', 'n_out': <class 'returnn.util.basic.NotSpecified'>, 'sources': [<SourceLayer 'data' out_type=Data(batch_shape_meta=[B,T|'time:var:extern_data:data',F|9])>], '_time_dim_tag': DimensionT..., len = 6
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 1045, in TFNetwork.add_layer
    line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      layer = <not found>
      self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'output', len = 6
      layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
      layer_desc = <local> {'_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'output', 'n_out': <class 'returnn.util.basic.NotSpecified'>, 'sources': [<SourceLayer 'data' out_type=Data(batch_shape_meta=[B,T|'time:var:extern_data:data',F|9])>], '_time_dim_tag': DimensionT..., len = 6
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 967, in TFNetwork._create_layer
    line: layer = layer_class(**layer_desc)
    locals:
      layer = <not found>
      layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
      layer_desc = <local> {'_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'output', 'n_out': <class 'returnn.util.basic.NotSpecified'>, 'sources': [<SourceLayer 'data' out_type=Data(batch_shape_meta=[B,T|'time:var:extern_data:data',F|9])>], '_time_dim_tag': DimensionT..., len = 9
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 236, in RecLayer.__init__
    line: y = self._get_output_subnet_unit(self.cell)
    locals:
      y = <not found>
      self = <local> <RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])>
      self._get_output_subnet_unit = <local> <bound method RecLayer._get_output_subnet_unit of <RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])>>
      self.cell = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 2490, in _SubnetworkRecCell.get_output
    line: self._construct_output_layers_moved_out(
            loop_accumulated=self.final_acc_tas_dict, seq_len=seq_len,
            extra_output_layers=extra_output_layers, final_net_vars=final_net_vars)
    locals:
      self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      self._construct_output_layers_moved_out = <local> <bound method _SubnetworkRecCell._construct_output_layers_moved_out of <_SubnetworkRecCell 'root/output(rec-subnet)'>>
      loop_accumulated = <not found>
      self.final_acc_tas_dict = <local> {'loss_output': <tf.TensorArray 'output/rec/subnet_base/acc_ta_loss_output'>, 'error_output': <tf.TensorArray 'output/rec/subnet_base/acc_ta_error_output'>, 'output_output': <tf.TensorArray 'output/rec/subnet_base/acc_ta_output_output'>, 'output_input': <tf.TensorArray 'output/rec/subnet_base/acc...
      seq_len = <local> <tf.Tensor 'output/rec/subnet_base/check_seq_len_batch_size/check_input_dim/identity_with_dim_check:0' shape=(?,) dtype=int32>
      extra_output_layers = <local> {'output'}, len = 1
      final_net_vars = <local> ([<tf.Tensor 'output/rec/while/Exit_1:0' shape=(?, 2) dtype=float32>], [])
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 3266, in _SubnetworkRecCell._construct_output_layers_moved_out
    line: get_layer(layer_name)
    locals:
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x7f7b20204f28>
      layer_name = <local> 'output1', len = 7
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 3254, in _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer
    line: return self.output_layers_net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
    locals:
      self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      self.output_layers_net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.output_layers_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      self.net_dict = <local> {'input': {'class': 'copy', 'from': ['prev:output', 'data:source']}, 'FF_0': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10}, 'FF_1': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10, 'reuse_params': {'map': {'W': {'reuse_layer': 'FF_0'}, 'b': {'r...
      name = <local> 'output1', len = 7
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x7f7b20204f28>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 883, in TFNetwork.construct_layer
    line: layer_class.transform_config_dict(layer_desc, network=net, get_layer=get_layer)
    locals:
      layer_class = <local> <class 'returnn.tf.layers.basic.SoftmaxLayer'>
      layer_class.transform_config_dict = <local> <bound method LayerBase.transform_config_dict of <class 'returnn.tf.layers.basic.SoftmaxLayer'>>
      layer_desc = <local> {'loss': 'ce', '_network': <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'output1'}
      network = <not found>
      net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x7f7b20204f28>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 468, in LayerBase.transform_config_dict
    line: for src_name in src_names
    locals:
      src_name = <not found>
      src_names = <local> ['FF_1']
File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 469, in <listcomp>
    line: d["sources"] = [
            get_layer(src_name)
            for src_name in src_names
            if not src_name == "none"]
    locals:
      d = <not found>
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x7f7b20204f28>
      src_name = <local> 'FF_1'
      src_names = <not found>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 3254, in _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer
    line: return self.output_layers_net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
    locals:
      self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      self.output_layers_net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.output_layers_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      self.net_dict = <local> {'input': {'class': 'copy', 'from': ['prev:output', 'data:source']}, 'FF_0': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10}, 'FF_1': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10, 'reuse_params': {'map': {'W': {'reuse_layer': 'FF_0'}, 'b': {'r...
      name = <local> 'FF_1'
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x7f7b20204f28>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 890, in TFNetwork.construct_layer
    line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'FF_1'
      name_with_prefix = <local> 'FF_1'
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_desc = <local> {'activation': 'tanh', 'n_out': 10, 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_typ..., len = 6
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 1045, in TFNetwork.add_layer
    line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      layer = <not found>
      self = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'FF_1'
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_desc = <local> {'activation': 'tanh', 'n_out': 10, 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_typ..., len = 6
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 967, in TFNetwork._create_layer
    line: layer = layer_class(**layer_desc)
    locals:
      layer = <not found>
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_desc = <local> {'activation': 'tanh', 'n_out': 10, 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_typ..., len = 9File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 469, in <listcomp>
    line: d["sources"] = [
            get_layer(src_name)
            for src_name in src_names
            if not src_name == "none"]
    locals:
      d = <not found>
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x7f7b20204f28>
      src_name = <local> 'FF_1'
      src_names = <not found>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/rec.py", line 3254, in _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer
    line: return self.output_layers_net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
    locals:
      self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
      self.output_layers_net = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self.output_layers_net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      self.net_dict = <local> {'input': {'class': 'copy', 'from': ['prev:output', 'data:source']}, 'FF_0': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10}, 'FF_1': {'activation': 'tanh', 'class': 'linear', 'from': ['input'], 'n_out': 10, 'reuse_params': {'map': {'W': {'reuse_layer': 'FF_0'}, 'b': {'r...
      name = <local> 'FF_1'
      get_layer = <local> <function _SubnetworkRecCell._construct_output_layers_moved_out.<locals>.get_layer at 0x7f7b20204f28>
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 890, in TFNetwork.construct_layer
    line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
    locals:
      add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'FF_1'
      name_with_prefix = <local> 'FF_1'
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_desc = <local> {'activation': 'tanh', 'n_out': 10, 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_typ..., len = 6
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 1045, in TFNetwork.add_layer
    line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
    locals:
      layer = <not found>
      self = <local> <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
      self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root/output(rec-subnet-output)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|2])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
      name = <local> 'FF_1'
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_desc = <local> {'activation': 'tanh', 'n_out': 10, 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_typ..., len = 6
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/network.py", line 967, in TFNetwork._create_layer
    line: layer = layer_class(**layer_desc)
    locals:
      layer = <not found>
      layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
      layer_desc = <local> {'activation': 'tanh', 'n_out': 10, 'reuse_params': <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_typ..., len = 9
File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/basic.py", line 1457, in LinearLayer.__init__
    line: weights = self.add_param(tf_compat.v1.get_variable(
            name="W", shape=weights_shape, dtype=tf.float32, initializer=fwd_weights_initializer))
    locals:
      weights = <not found>
      self = <local> <LinearLayer output/'FF_1' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>
      self.add_param = <local> <bound method LayerBase.add_param of <LinearLayer output/'FF_1' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>>
      tf_compat = <global> <module 'returnn.tf.compat' from '/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/compat.py'>
      tf_compat.v1 = <global> <module 'tensorflow' from '/u/beck/programs/python/3.6.1/lib/python3.6/site-packages/tensorflow/__init__.py'>
      tf_compat.v1.get_variable = <global> <function get_variable at 0x7f7b4c24b730>
      name = <not found>
      shape = <not found>
      weights_shape = <local> (11, 10)
      dtype = <not found>
      tf = <global> <module 'tensorflow' from '/u/beck/programs/python/3.6.1/lib/python3.6/site-packages/tensorflow/__init__.py'>
      tf.float32 = <global> tf.float32
      initializer = <not found>
      fwd_weights_initializer = <local> <tensorflow.python.ops.init_ops.VarianceScaling object at 0x7f7b20221fd0>
  File "/u/beck/programs/python/3.6.1/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1317, in get_variable
    line: return get_variable_scope().get_variable(
              _get_default_variable_store(), name, shape=shape, dtype=dtype,
              initializer=initializer, regularizer=regularizer, trainable=trainable,
              collections=collections, caching_device=caching_device,
              partitioner=partitioner, validate_shape=validate_shape,
              use_resource=use_resource, custom_getter=custom_getter,
              constraint=constraint)
    locals:
      get_variable_scope = <global> <function get_variable_scope at 0x7f7b4c24b510>
      get_variable = <global> <function get_variable at 0x7f7b4c24b730>
      _get_default_variable_store = <global> <function _get_default_variable_store at 0x7f7b4c24b598>
      name = <local> 'W'
      shape = <local> (11, 10)
      dtype = <local> tf.float32
      initializer = <local> <tensorflow.python.ops.init_ops.VarianceScaling object at 0x7f7b20221fd0>
      regularizer = <local> None
      trainable = <local> True
      collections = <local> None
      caching_device = <local> None
      partitioner = <local> None
      validate_shape = <local> True
      use_resource = <local> None
      custom_getter = <local> None
      constraint = <local> None
File "/u/beck/programs/python/3.6.1/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1079, in VariableScope.get_variable
    line: return var_store.get_variable(
              full_name, shape=shape, dtype=dtype, initializer=initializer,
              regularizer=regularizer, reuse=reuse, trainable=trainable,
              collections=collections, caching_device=caching_device,
              partitioner=partitioner, validate_shape=validate_shape,
              use_resource=use_resource, custom_getter=custom_getter,
              constraint=constraint)
    locals:
      var_store = <local> <tensorflow.python.ops.variable_scope._VariableStore object at 0x7f7b4c9969e8>
      var_store.get_variable = <local> <bound method _VariableStore.get_variable of <tensorflow.python.ops.variable_scope._VariableStore object at 0x7f7b4c9969e8>>
      full_name = <local> 'output/rec/FF_1/W', len = 17
      shape = <local> (11, 10)
      dtype = <local> tf.float32
      initializer = <local> <tensorflow.python.ops.init_ops.VarianceScaling object at 0x7f7b20221fd0>
      regularizer = <local> None
      reuse = <local> <_ReuseMode.AUTO_REUSE: 1>
      trainable = <local> True
      collections = <local> None
      caching_device = <local> None
      partitioner = <local> None
      validate_shape = <local> True
      use_resource = <local> None
      custom_getter = <local> <function ReuseParams.get_variable_scope.<locals>._variable_custom_getter at 0x7f7b20217268>
      constraint = <local> None
  File "/u/beck/programs/python/3.6.1/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 417, in _VariableStore.get_variable
    line: return custom_getter(**custom_getter_kwargs)
    locals:
      custom_getter = <local> <function ReuseParams.get_variable_scope.<locals>._variable_custom_getter at 0x7f7b20217268>
      custom_getter_kwargs = <local> {'getter': <function _VariableStore.get_variable.<locals>._true_getter at 0x7f7b202172f0>, 'name': 'output/rec/FF_1/W', 'shape': (11, 10), 'dtype': tf.float32, 'initializer': <tensorflow.python.ops.init_ops.VarianceScaling object at 0x7f7b20221fd0>, 'regularizer': None, 'reuse': <_ReuseMode.AUTO_..., len = 13
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 1801, in ReuseParams.get_variable_scope.<locals>._variable_custom_getter
    line: return self.variable_custom_getter(base_layer=base_layer, **kwargs_)
    locals:
      self = <local> <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:dat...
      self.variable_custom_getter = <local> <bound method ReuseParams.variable_custom_getter of <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_typ...
      base_layer = <local> <LinearLayer output/'FF_1' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>
      kwargs_ = <local> {'getter': <function _VariableStore.get_variable.<locals>._true_getter at 0x7f7b202172f0>, 'name': 'output/rec/FF_1/W', 'shape': (11, 10), 'dtype': tf.float32, 'initializer': <tensorflow.python.ops.init_ops.VarianceScaling object at 0x7f7b20221fd0>, 'regularizer': None, 'reuse': <_ReuseMode.AUTO_..., len = 13
File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 1840, in ReuseParams.variable_custom_getter
    line: return self.param_map[param_name].variable_custom_getter(
            getter=getter, name=name, base_layer=base_layer, **kwargs)
    locals:
      self = <local> <ReuseParams reuse_layer None, map {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:dat...
      self.param_map = <local> {'W': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>, 'b': <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>}
      param_name = <local> 'W'
      variable_custom_getter = <not found>
      getter = <local> <function _VariableStore.get_variable.<locals>._true_getter at 0x7f7b202172f0>
      name = <local> 'output/rec/FF_1/W', len = 17
      base_layer = <local> <LinearLayer output/'FF_1' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>
      kwargs = <local> {'shape': (11, 10), 'dtype': tf.float32, 'initializer': <tensorflow.python.ops.init_ops.VarianceScaling object at 0x7f7b20221fd0>, 'regularizer': None, 'reuse': <_ReuseMode.AUTO_REUSE: 1>, 'trainable': True, 'collections': None, 'caching_device': None, 'partitioner': None, 'validate_shape': True,..., len = 11
  File "/u/glushko/setups/switchboard/2021-06-21--ilmt-att-sis/returnn/returnn/tf/layers/base.py", line 1843, in ReuseParams.variable_custom_getter
    line: assert param_name in self.reuse_layer.params
    locals:
      param_name = <local> 'W'
      self = <local> <ReuseParams reuse_layer <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>, map None>
      self.reuse_layer = <local> <InternalLayer output/'FF_0' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:data',B,F|10])>
      self.reuse_layer.params = <local> {}
AssertionError
aleksglushko commented 3 years ago

@albertz you were right about the: "From a first glance, maybe this is because the layer FF_0 is inside the loop and the reuse params logic when it accesses the layer tries to get it from outside." Possible solution for sharing in my case is to use the flag optimize_move_layers_out=0. But without optimization calculation will be slower.

albertz commented 3 years ago

@albertz you were right about the: "From a first glance, maybe this is because the layer FF_0 is inside the loop and the reuse params logic when it accesses the layer tries to get it from outside." Possible solution for sharing in my case is to use the flag optimize_move_layers_out=0. But without optimization calculation will be slower.

optimize_move_layers_out=0 is much slower and consumes more memory, such that I would never recommend to ever use that. Also not in your case. If you want a workaround, try the other workaround I suggested above (using a custom function).

aleksglushko commented 3 years ago

Using custom function with getting variable using a scope works well for linear layers, but somehow it is not usable for LSTMBlock.
I use the following function as you mentioned above:

def get_var(name, shape):
    from returnn.tf.util.basic import reuse_name_scope
    from returnn.tf.compat import v1 as tf
    with reuse_name_scope('', absolute=True):
        print('Reused variable: ', tf.get_variable(name, shape))
        return tf.get_variable(name, shape)

and trying to get share the weights in the following way:

's_shared': { 'class': 'rnn_cell',  'from': ['prev:target_embed'], 'n_out': 1000, 'unit': 'LSTMBlock',
                    'reuse_params': { 'map': { 
'lstm_cell/bias': { 'custom': lambda **_kwargs: get_var('output/rec/s/rec/lstm_cell/bias', _kwargs['shape']),},
 'lstm_cell/kernel': { 'custom': lambda **_kwargs: get_var('output/rec/s/rec/lstm_cell/kernel', _kwargs['shape'])}}},},

, where

's': {'class': 'rnn_cell', 'from': ['prev:target_embed'], 'n_out': 1000, 'unit': 'LSTMBlock'},

For linear layers it works in this way:

'readout_in': { 'activation': None, 'class': 'linear',  'from': ['s', 'prev:target_embed', 'att'], 'n_out': 1000, 'with_bias': True},
'readout_in_shared': { 'activation': None,  'class': 'linear',  'from': ['iLMT_s', 'prev:target_embed', 'zero_att'],  'n_out': 1000,
'reuse_params': { 'map': { 
'W': { 'custom': lambda **_kwargs: get_var('output/rec/readout_in/W', _kwargs['shape'])},
'b': { 'custom': lambda **_kwargs: get_var('output/rec/readout_in/b', _kwargs['shape'])}}},
 'with_bias': True},
albertz commented 3 years ago

(Please properly format using Markdown. I fixed that for you.)

but somehow it is not usable for LSTMBlock

What do you mean by that?

Btw, instead of using LSTMBlock, you better should use NativeLstm2. Also, instead of using rnn_cell, better use rec. I.e.:

's': {'class': 'rec', 'from': 'prev:target_embed', 'n_out': 1000, 'unit': 'NativeLstm2'},

Maybe that already fixes your problems.

aleksglushko commented 3 years ago

I had to mention that RETURNN couldn't find the parameters for sharing. But changing the class from rnn_cell to 'rec' helped. And with this the parameter to share will be: 'rnn/lstm_cell/bias', 'rnn/lstm_cell/kernel'

albertz commented 3 years ago

but somehow it is not usable for LSTMBlock

What is meant by that? Please be more specific. Not usable in what way? What happens? You get the error you described here? Or sth else?

But changing the class from rnn_cell to rec helped.

What does that mean? Helped how? It works then? Or what? Please be more specific.

albertz commented 3 years ago

In any case, the original bug here is fixed now, via #695.

It would be nice if you could clarify the other things I asked about.

If there are other further problems, please open a new issue.