rwth-i6 / returnn

The RWTH extensible training framework for universal recurrent neural networks
http://returnn.readthedocs.io/
Other
349 stars 130 forks source link

Error when running demos/demo-hyper-param-tuning.config #345

Closed mattiadg closed 4 years ago

mattiadg commented 4 years ago

The demo crashes with the following error.

Unhandled exception <class 'AssertionError'> in thread <_MainThread(MainThread, started 140030923687680)>, proc 13433.

Thread current, main, <_MainThread(MainThread, started 140030923687680)>:
(Excluded thread.)

That were all threads.
EXCEPTION
Traceback (most recent call last):
  File "rnn.py", line 11, in <module>
    line: main()
    locals:
      main = <local> <function main at 0x7f5b717b5048>
  File "/home/mdigangi/bin/returnn/returnn/__main__.py", line 642, in main
    line: execute_main_task()
    locals:
      execute_main_task = <global> <function execute_main_task at 0x7f5b717abea0>
  File "/home/mdigangi/bin/returnn/returnn/__main__.py", line 535, in execute_main_task
    line: tuner.work()
    locals:
      tuner = <local> <returnn.tf.hyper_param_tuning.Optimization object at 0x7f5a700f4390>
      tuner.work = <local> <bound method Optimization.work of <returnn.tf.hyper_param_tuning.Optimization object at 0x7f5a700f4390>>
  File "/home/mdigangi/bin/returnn/returnn/tf/hyper_param_tuning.py", line 553, in work
    line: _IndividualTrainer(optim=self, individual=population[0], gpu_ids={0}).run()
    locals:
      _IndividualTrainer = <global> <class 'returnn.tf.hyper_param_tuning._IndividualTrainer'>
      optim = <not found>
      self = <local> <returnn.tf.hyper_param_tuning.Optimization object at 0x7f5a700f4390>
      individual = <not found>
      population = <local> [<returnn.tf.hyper_param_tuning.Individual object at 0x7f5a70119ac8>, <returnn.tf.hyper_param_tuning.Individual object at 0x7f5a701199b0>, <returnn.tf.hyper_param_tuning.Individual object at 0x7f5a70119978>, <returnn.tf.hyper_param_tuning.Individual object at 0x7f5a701195f8>, <returnn.tf.hyper_pa..., len = 30
      gpu_ids = <not found>
      run = <not found>
  File "/home/mdigangi/bin/returnn/returnn/tf/hyper_param_tuning.py", line 640, in run
    line: engine.init_train_from_config(config=config, train_data=train_data)
    locals:
      engine = <local> <returnn.tf.engine.Engine object at 0x7f5b5f3b74e0>
      engine.init_train_from_config = <local> <bound method Engine.init_train_from_config of <returnn.tf.engine.Engine object at 0x7f5b5f3b74e0>>
      config = <local> <returnn.config.Config object at 0x7f5b5f3b1780>
      train_data = <local> <StaticDataset 'dataset_id140030416483608' epoch=None>
  File "/home/mdigangi/bin/returnn/returnn/tf/engine.py", line 1036, in init_train_from_config
    line: self.init_network_from_config(config)
    locals:
      self = <local> <returnn.tf.engine.Engine object at 0x7f5b5f3b74e0>
      self.init_network_from_config = <local> <bound method Engine.init_network_from_config of <returnn.tf.engine.Engine object at 0x7f5b5f3b74e0>>
      config = <local> <returnn.config.Config object at 0x7f5b5f3b1780>
  File "/home/mdigangi/bin/returnn/returnn/tf/engine.py", line 1094, in init_network_from_config
    line: assert self.epoch, "task %r" % config.value("task", "train")
    locals:
      self = <local> <returnn.tf.engine.Engine object at 0x7f5b5f3b74e0>
      self.epoch = <local> None
      config = <local> <returnn.config.Config object at 0x7f5b5f3b1780>
      config.value = <local> <bound method Config.value of <returnn.config.Config object at 0x7f5b5f3b1780>>
AssertionError: task 'hyper_param_tuning'

I think that the problem is due to the config having task = "hyper_param_tuning" as it can never initialize the epoch variable in init_network_from_config https://github.com/rwth-i6/returnn/blob/021171b7fa97a6b32e9a28d3919b74de1c6d46ef/returnn/tf/engine.py#L1062-L1094

albertz commented 4 years ago

You could have just posted this error in the PR. There is no need to make this a separate issue. This just makes it more complicated to follow and understand.