tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

RetinaNet - distributed training with CPU/GPU fails on model export #562

Open David-LiCause opened 4 years ago

David-LiCause commented 4 years ago

When training the RetinaNet model on CPU or GPU with '--distribution_strategy=mirrored' an error occurs while exporting the trained model.

The error occurs here (https://github.com/tensorflow/tpu/blob/master/models/official/retinanet/retinanet_main.py#L704) when a tf.TPUEstimator instance is created with the 'config' set to an instance of tf.estimator.RunConfig instead of tf.contrib.tpu.RunConfig.

Error message:

INFO:tensorflow:Exporting saved model.
I1022 20:36:52.486371 139714595177920 retinanet_main.py:695] Exporting saved model.
Traceback (most recent call last):
  File "tpu/models/official/retinanet/retinanet_main.py", line 727, in <module>
    app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "tpu/models/official/retinanet/retinanet_main.py", line 710, in main
    params=eval_params)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2551, in __init__
    '`config` must be provided with type `tpu_config.RunConfig`')
ValueError: `config` must be provided with type `tpu_config.RunConfig`
Byronnar commented 4 years ago

When training the RetinaNet model on CPU or GPU with '--distribution_strategy=mirrored' an error occurs while exporting the trained model.

The error occurs here (https://github.com/tensorflow/tpu/blob/master/models/official/retinanet/retinanet_main.py#L704) when a tf.TPUEstimator instance is created with the 'config' set to an instance of tf.estimator.RunConfig instead of tf.contrib.tpu.RunConfig.

Error message:

INFO:tensorflow:Exporting saved model.
I1022 20:36:52.486371 139714595177920 retinanet_main.py:695] Exporting saved model.
Traceback (most recent call last):
  File "tpu/models/official/retinanet/retinanet_main.py", line 727, in <module>
    app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "tpu/models/official/retinanet/retinanet_main.py", line 710, in main
    params=eval_params)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2551, in __init__
    '`config` must be provided with type `tpu_config.RunConfig`')
ValueError: `config` must be provided with type `tpu_config.RunConfig`

Have you solved this problem? @David-LiCause

David-LiCause commented 4 years ago

@Byronnar No, I haven't created a fix for this