nobutoba commented 4 years ago

Prerequisites

[x] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[x] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[x] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/slim/README.md https://github.com/tensorflow/models/blob/master/research/slim/slim_walkthrough.ipynb

2. Describe the bug

The sample code in README.md works with TensorFlow 1.15.3 but not with TensorFlow 2.2.0.
The notebook slim_walkthrough.ipynb does not work with either TensorFlow 1.15.3 or TensorFlow 2.2.0. In particular, two statements from tensorflow.contrib import slim and import tf_slim as slim appear in the same notebook, resulting in name conflict.

3. Steps to reproduce

The following sample commands in README.md throw exceptions with TensorFlow 2.2.0.

$ python train_image_classifier.py \
    --train_dir=${TRAIN_DIR} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=flowers \
    --dataset_split_name=train \
    --model_name=inception_v3 \
    --checkpoint_path=${CHECKPOINT_PATH} \
    --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
    --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits

$ python eval_image_classifier.py \
    --alsologtostderr \
    --checkpoint_path=${CHECKPOINT_FILE} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=imagenet \
    --dataset_split_name=validation \
    --model_name=inception_v3

The reason is that both train_image_classifier.py and eval_image_classifier.py contain the sentence from tensorflow.contrib import quantize as contrib_quantize, which raises ModuleNotFoundError: No module named 'tensorflow.contrib'.

If I run slim_walkthrough.ipynb with TensorFlow 1.15.3, it throws within the 6th cell (starting from # The following snippet trains the regression model) a UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. and then in the 9th cell right after the sentence "Finally, we print the final value of each metric":

TypeError    Traceback (most recent call last) <ipython-input-9-0b551dafa1af> in <module>
     16             num_evals=1, # Single pass over data
     17             eval_op=names_to_update_nodes.values(),
---> 18             final_op=names_to_value_nodes.values())
     19 
     20     names_to_values = dict(zip(names_to_value_nodes.keys(), metric_values))

TypeError: 'module' object is not callable

On the other hand, since slim_walkthrough.ipynb contains the statement from tensorflow.contrib import slim, it is clear that the notebook is not compatible with TensorFlow 2 as it stands. However, if I run the notebook, it actually throws an error within the 6th cell starting from "# The following snippet trains the regression model", before this import statement:

TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
  @tf.function
  def has_init_scope():
    my_constant = tf.constant(1.)
    with tf.init_scope():
      added = my_constant * 2
The graph tensor has name: global_step:0

4. Expected behavior

Apparently, the commit a couple of days ago aims to make the TensorFlow-Slim Image Classification Model Library compatible with TensorFlow 2. For example, it deleted the ![TensorFlow 2 Not Supported] tag from READMEs and replaced as many tf.contrib.slim with tf-slim as possible. Since the sample code in README.md works with TensorFlow 1.15.3 anyway, this might not count as a bug, but it is at least confusing for a non-experienced TensorFlow user like me 😭
I hope the sample notebook slim_walkthrough.ipynb works with either TensorFlow 1 or 2. Also, for TensorFlow 1, the two conflicting statements from tensorflow.contrib import slim and import tf_slim as slim are better avoided.

5. Additional context

Full log for the 6th cell in the notebook, run with TensorFlow 1.15.3

```python WARNING:tensorflow:From /home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow_core/python/ops/losses/losses_impl.py:121: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where WARNING:tensorflow:From :16: get_total_loss (from tf_slim.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_total_loss instead. WARNING:tensorflow:From /home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tf_slim/losses/loss_ops.py:236: get_losses (from tf_slim.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_losses instead. WARNING:tensorflow:From /home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tf_slim/losses/loss_ops.py:238: get_regularization_losses (from tf_slim.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_regularization_losses instead. WARNING:tensorflow:From /home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tf_slim/learning.py:734: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting Session. INFO:tensorflow:Saving checkpoint to path /tmp/regression_model/model.ckpt INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:Starting Queues. INFO:tensorflow:global step 499: loss = 0.4413 (0.001 sec/step) INFO:tensorflow:global step 999: loss = 0.2760 (0.001 sec/step) INFO:tensorflow:global step 1499: loss = 0.2333 (0.001 sec/step) INFO:tensorflow:global step 1999: loss = 0.2453 (0.001 sec/step) INFO:tensorflow:global step 2499: loss = 0.1999 (0.001 sec/step) INFO:tensorflow:global_step/sec: 547.203 INFO:tensorflow:global step 2999: loss = 0.1675 (0.001 sec/step) INFO:tensorflow:global step 3499: loss = 0.1778 (0.001 sec/step) INFO:tensorflow:global step 3999: loss = 0.2127 (0.001 sec/step) INFO:tensorflow:global step 4499: loss = 0.1784 (0.001 sec/step) INFO:tensorflow:global step 4999: loss = 0.1660 (0.001 sec/step) INFO:tensorflow:Stopping Training. INFO:tensorflow:Finished training! Saving model to disk. Finished training. Last batch loss: 0.16604608 Checkpoint saved in /tmp/regression_model/ /home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow_core/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " ```

Full log for the 9th cell in the notebook, run with TensorFlow 1.15.3

```python WARNING:tensorflow:From :7: streaming_mean_squared_error (from tf_slim.metrics.metric_ops) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.metrics.mean_squared_error. Note that the order of the labels and predictions arguments has been switched. WARNING:tensorflow:From :8: streaming_mean_absolute_error (from tf_slim.metrics.metric_ops) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.metrics.mean_absolute_error. Note that the order of the labels and predictions arguments has been switched. INFO:tensorflow:Restoring parameters from /tmp/regression_model/model.ckpt INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting standard services. INFO:tensorflow:Saving checkpoint to path /tmp/regression_model/model.ckpt INFO:tensorflow:Starting queue runners. INFO:tensorflow:Error reported to Coordinator: , 'module' object is not callable --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in 16 num_evals=1, # Single pass over data 17 eval_op=names_to_update_nodes.values(), ---> 18 final_op=names_to_value_nodes.values()) 19 20 names_to_values = dict(zip(names_to_value_nodes.keys(), metric_values)) TypeError: 'module' object is not callable ```

Full log for the 6th cell in the notebook, run with TensorFlow 2.2.0

```python WARNING:tensorflow:From :16: get_total_loss (from tf_slim.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_total_loss instead. WARNING:tensorflow:From /home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tf_slim/losses/loss_ops.py:236: get_losses (from tf_slim.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_losses instead. WARNING:tensorflow:From /home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tf_slim/losses/loss_ops.py:238: get_regularization_losses (from tf_slim.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_regularization_losses instead. WARNING:tensorflow:From /home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tf_slim/learning.py:734: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting Session. INFO:tensorflow:Saving checkpoint to path /tmp/regression_model/model.ckpt INFO:tensorflow:Error reported to Coordinator: An op outside of the function building code is being passed a "Graph" tensor. It is possible to have Graph tensors leak out of the function building context by including a tf.init_scope in your function building code. For example, the following function will fail: @tf.function def has_init_scope(): my_constant = tf.constant(1.) with tf.init_scope(): added = my_constant * 2 The graph tensor has name: global_step:0 Traceback (most recent call last): File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py", line 470, in read_variable_op tld.op_callbacks, resource, "dtype", dtype) tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception yield File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py", line 485, in run self.start_loop() File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/supervisor.py", line 1077, in start_loop self._last_step = training_util.global_step(self._sess, self._step_counter) File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/training_util.py", line 67, in global_step return int(global_step_tensor.numpy()) File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 603, in numpy return self.read_value().numpy() File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 666, in read_value value = self._read_variable_op() File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 645, in _read_variable_op result = read_and_set_handle() File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 636, in read_and_set_handle self._dtype) File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py", line 475, in read_variable_op resource, dtype=dtype, name=name, ctx=_ctx) File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py", line 502, in read_variable_op_eager_fallback attrs=_attrs, ctx=ctx, name=name) File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 75, in quick_execute raise e File "/home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) TypeError: An op outside of the function building code is being passed a "Graph" tensor. It is possible to have Graph tensors leak out of the function building context by including a tf.init_scope in your function building code. For example, the following function will fail: @tf.function def has_init_scope(): my_constant = tf.constant(1.) with tf.init_scope(): added = my_constant * 2 The graph tensor has name: global_step:0 INFO:tensorflow:Starting Queues. INFO:tensorflow:Finished training! Saving model to disk. /home/username/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/summary/writer/writer.py:388: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened. warnings.warn("Attempting to use a closed FileWriter. " --------------------------------------------------------------------------- _FallbackException Traceback (most recent call last) ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py in read_variable_op(resource, dtype, name) 469 _ctx._context_handle, tld.device_name, "ReadVariableOp", name, --> 470 tld.op_callbacks, resource, "dtype", dtype) 471 return _result _FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors. During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) in 26 number_of_steps=5000, 27 save_summaries_secs=5, ---> 28 log_every_n_steps=500) 29 30 print("Finished training. Last batch loss:", final_loss) ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tf_slim/learning.py in train(train_op, logdir, train_step_fn, train_step_kwargs, log_every_n_steps, graph, master, is_chief, global_step, number_of_steps, init_op, init_feed_dict, local_init_op, init_fn, ready_op, summary_op, save_summaries_secs, summary_writer, startup_delay_steps, saver, save_interval_secs, sync_optimizer, session_config, session_wrapper, trace_every_n_steps, ignore_live_threads) 780 threads, 781 close_summary_writer=True, --> 782 ignore_live_threads=ignore_live_threads) 783 784 except errors.AbortedError: ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/supervisor.py in stop(self, threads, close_summary_writer, ignore_live_threads) 837 threads, 838 stop_grace_period_secs=self._stop_grace_secs, --> 839 ignore_live_threads=ignore_live_threads) 840 finally: 841 # Close the writer last, in case one of the running threads was using it. ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py in join(self, threads, stop_grace_period_secs, ignore_live_threads) 387 self._registered_threads = set() 388 if self._exc_info_to_raise: --> 389 six.reraise(*self._exc_info_to_raise) 390 elif stragglers: 391 if ignore_live_threads: ~/tensorflow-slim/.venv/lib/python3.7/site-packages/six.py in reraise(tp, value, tb) 701 if value.__traceback__ is not tb: 702 raise value.with_traceback(tb) --> 703 raise value 704 finally: 705 value = None ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py in stop_on_exception(self) 295 """ 296 try: --> 297 yield 298 except: # pylint: disable=bare-except 299 self.request_stop(ex=sys.exc_info()) ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py in run(self) 483 def run(self): 484 with self._coord.stop_on_exception(): --> 485 self.start_loop() 486 if self._timer_interval_secs is None: 487 # Call back-to-back. ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/supervisor.py in start_loop(self) 1075 def start_loop(self): 1076 self._last_time = time.time() -> 1077 self._last_step = training_util.global_step(self._sess, self._step_counter) 1078 1079 def run_loop(self): ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/training/training_util.py in global_step(sess, global_step_tensor) 65 """ 66 if context.executing_eagerly(): ---> 67 return int(global_step_tensor.numpy()) 68 return int(sess.run(global_step_tensor)) 69 ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py in numpy(self) 601 def numpy(self): 602 if context.executing_eagerly(): --> 603 return self.read_value().numpy() 604 raise NotImplementedError( 605 "numpy() is only available when eager execution is enabled.") ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py in read_value(self) 664 """ 665 with ops.name_scope("Read"): --> 666 value = self._read_variable_op() 667 # Return an identity so it can get placed on whatever device the context 668 # specifies instead of the device where the variable is. ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py in _read_variable_op(self) 643 result = read_and_set_handle() 644 else: --> 645 result = read_and_set_handle() 646 647 if not context.executing_eagerly(): ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py in read_and_set_handle() 634 def read_and_set_handle(): 635 result = gen_resource_variable_ops.read_variable_op(self._handle, --> 636 self._dtype) 637 _maybe_set_handle_data(self._dtype, self._handle, result) 638 return result ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py in read_variable_op(resource, dtype, name) 473 try: 474 return read_variable_op_eager_fallback( --> 475 resource, dtype=dtype, name=name, ctx=_ctx) 476 except _core._SymbolicException: 477 pass # Add nodes to the TensorFlow graph. ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py in read_variable_op_eager_fallback(resource, dtype, name, ctx) 500 _attrs = ("dtype", dtype) 501 _result = _execute.execute(b"ReadVariableOp", 1, inputs=_inputs_flat, --> 502 attrs=_attrs, ctx=ctx, name=name) 503 if _execute.must_record_gradient(): 504 _execute.record_gradient( ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 73 "Inputs to eager execution function cannot be Keras symbolic " 74 "tensors, but found {}".format(keras_symbolic_tensors)) ---> 75 raise e 76 # pylint: enable=protected-access 77 return tensors ~/tensorflow-slim/.venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 58 ctx.ensure_initialized() 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, ---> 60 inputs, attrs, num_outputs) 61 except core._NotOkStatusException as e: 62 if name is not None: TypeError: An op outside of the function building code is being passed a "Graph" tensor. It is possible to have Graph tensors leak out of the function building context by including a tf.init_scope in your function building code. For example, the following function will fail: @tf.function def has_init_scope(): my_constant = tf.constant(1.) with tf.init_scope(): added = my_constant * 2 The graph tensor has name: global_step:0 ```

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian GNU/Linux 9 (stretch)
Mobile device name if the issue happens on a mobile device: N/A
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): TensorFlow 2.2.0 (optionally 1.15.3)
Python version: Python 3.7.6
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: CUDA V10.1.243 (optionally CUDA V10.0.130 for TensorFlow 1.15.3)/cuDNN 7.6.5
GPU model and memory: Tesla T4 with 15079MiB memory

marksandler2 commented 4 years ago

Thanks for the report. The notebook does need to be fixed. Slim will probably never work in tensorflow 2.0 eager mode, only in graph mode. Thus command line examples won't work as is. We should restore that header in the README.MD with some caveats.

kyscg commented 4 years ago

What is the best way to fix this? Should we mention that [TensorFlow 2 might not be supported] in the README or change all instances of from tensorflow.contrib import slim to import tf_slim as slim. Or maybe both?

Slim will probably never work in tensorflow 2.0 eager mode, only in graph mode

Is this being deprecated or is there any other reason it doesn't work?

marksandler2 commented 4 years ago

We will update the readme.md and notebook shortly. Skim is mostly in maintenance mode but full on tf2 support basically requires a very thorough rewrite and not all concepts of slim map nicely in tf2.

On Wed, Jun 3, 2020, 5:13 PM Kilaru Yasaswi Sri Chandra Gandhi < notifications@github.com> wrote:

What is the best way to fix this? Should we mention that [TensorFlow 2 might not be supported] in the README or change all instances of from tensorflow.contrib import slim to import tf_slim as slim. Or maybe both?

Slim will probably never work in tensorflow 2.0 eager mode, only in graph mode

Is this being deprecated or is there any other reason it doesn't work?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/models/issues/8594#issuecomment-638525440, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVIDWDFJ2EWIR5WIANE4Q3RU3RLFANCNFSM4NOQITFQ .

tensorflow / models

Name conflict: tensorflow.contrib.slim vs tf_slim #8594