TensorFlow version in retrain.py

jkfran commented 1 year ago

Hello, I am entirely new to TensorFlow, but I am trying to run the retrain script, and it looks like this file is not coded for TensorFlow 2.7.2, which is the version specified in the requirements.txt.

This is the error that I am getting:

Traceback (most recent call last):
  File "retrain.py", line 1062, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
AttributeError: module 'tensorflow' has no attribute 'app'

I was able to update the script using tf_upgrade_v2 (https://www.tensorflow.org/guide/migrate/upgrade) and run it, but this is the error that I am getting after doing that:

Looking for images in 'moon'
Looking for images in 'mars'
Looking for images in 'jupiter'
Looking for images in 'spiral'
Looking for images in 'uranus'
Looking for images in 'venus'
Looking for images in 'neptune'
Looking for images in 'earth'
Looking for images in 'mercury'
Looking for images in 'saturn'
Looking for images in 'asteroids'
Looking for images in 'elliptical'
100 bottleneck files created.
200 bottleneck files created.
300 bottleneck files created.
400 bottleneck files created.
500 bottleneck files created.
600 bottleneck files created.
700 bottleneck files created.
800 bottleneck files created.
900 bottleneck files created.
Traceback (most recent call last):
  File "retrain.py", line 1062, in <module>
    tf.compat.v1.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "retrain.py", line 812, in main
    final_tensor) = add_final_training_ops(len(image_lists.keys()),
  File "retrain.py", line 708, in add_final_training_ops
    bottleneck_input = tf.compat.v1.placeholder_with_default(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/array_ops.py", line 3341, in placeholder_with_default
    return gen_array_ops.placeholder_with_default(input, shape, name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 7013, in placeholder_with_default
    return placeholder_with_default_eager_fallback(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 7039, in placeholder_with_default_eager_fallback
    _result = _execute.execute(b"PlaceholderWithDefault", 1,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 72, in quick_execute
    raise e
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 58, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
TypeError: Originated from a graph execution error.

The graph execution error is detected at a node built at (most recent call last):
>>>  File retrain.py, line 1062, in <module>
>>>  File /usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py, line 40, in run
>>>  File /usr/local/lib/python3.8/dist-packages/absl/app.py, line 312, in run
>>>  File /usr/local/lib/python3.8/dist-packages/absl/app.py, line 258, in _run_main
>>>  File retrain.py, line 779, in main
>>>  File retrain.py, line 254, in create_inception_graph
>>>  File /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py, line 552, in new_func
>>>  File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/importer.py, line 407, in import_graph_def
>>>  File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/importer.py, line 520, in _import_graph_def_internal
>>>  File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/importer.py, line 251, in _ProcessNewOps
>>>  File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py, line 3847, in _add_new_tf_operations
>>>  File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py, line 3848, in <listcomp>
>>>  File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py, line 3730, in _create_op_from_tf_operation
>>>  File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py, line 2101, in __init__

Error detected in node 'pool_3/_reshape' defined at: File "retrain.py", line 254, in create_inception_graph

TypeError: tf.Graph captured an external symbolic tensor. The symbolic tensor 'pool_3/_reshape:0' created by node 'pool_3/_reshape' is captured by the tf.Graph being executed as an input. But a tf.Graph is not allowed to take symbolic tensors from another graph as its inputs. Make sure all captured inputs of the executing tf.Graph are not symbolic tensors. Use return values, explicit Python locals or TensorFlow collections to access it. Please see https://www.tensorflow.org/guide/function#all_outputs_of_a_tffunction_must_be_return_values for more information.

SSahas commented 1 year ago

@jkfran
my TensorFlow version is 2.6.2, facing the very same issue, you mentioned in the beginning.

you said you updated the script using tf_upgrade_v2 can you tell me what is that , how to do it.? i visited the link , but didn't understood properly!!

Rohan-Datta commented 1 year ago

@SSahas I'm not the OP, but here's how I managed to upgrade retrain.py to TF 2:

Installed requirements through requirements.txt
Ran tf_upgrade_v2 --infile hub/examples/image_retraining/retrain.py --inplace. This gave an error which I was able to fix by following Solution 1 in this post.
Made changes in the requirements.txt file according to the linked text for a more permanent fix, and installed the correct protobuf version.
Ran the full tf_upgrade_v2 command again.

I validated the change by looking at the py file and the report.txt that gets generated. They look good to me. Running the retrain script, however, gives a different error which I have not been able to resolve. You can find the latest code on my fork of this repo.

ritwik12 commented 1 year ago

@jkfran @SSahas @Rohan-Datta Looks like an issue with @dependabot updates for tensorflow. I have not tried the latest versions. can you please check this with older versions as mentioned in this? Let me know which latest version works and we can open a PR for it.

Rohan-Datta commented 1 year ago

@ritwik12 As of now, 2.7.2 is working fine. I converted the code in retrain.py to TensorFlow 2.x and I was then able to resolve the error that @jkfran mentioned in the original comment by disabling eager execution as is suggested in this post. I was also able to retrain and test the model successfully. I've pushed the updated files to my fork of this repo.

SSahas commented 1 year ago

Thank you @Rohan-Datta for telling me how to upgrade retarin.py and the solution for @jkfran mentioned.

Now I am able to train the model successfully.

ritwik12 / Celestial-bodies-detection

TensorFlow version in retrain.py #89