tensorflow / models

Models and examples built with TensorFlow
Other
77.18k stars 45.75k forks source link

Feature Request: Freeze capability NasNet-A-Large Checkpoint on a CPU #3143

Closed rickhg12hs closed 4 years ago

rickhg12hs commented 6 years ago

System information

$ python3 -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2018-01-10 22:56:18.576633: I tensorflow/core/platform/s3/aws_logging.cc:53] Initializing Curl library /usr/lib64/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters v1.3.0-rc1-6922-ga77096897f 1.5.0-rc0

Describe the problem

The NasNet-A-Large trained checkpoint advertised here and downloadable from here appears to be only freezeable for a GPU. Attempts to freeze it with a CPU generates errors. There is an apparent incapatibility here between NHWC and NCHW data formats.

Using pretrained models on a CPU is expected and should be a usable feature.

Source code / logs

$ ipython3
Python 3.6.3 (default, Oct  9 2017, 12:11:29) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tensorflow as tf
   ...: import tensorflow.contrib.slim as slim
   ...: import numpy as np
   ...: 
   ...: import sys
   ...: sys.path.append("/usr/local/src/tensorflow/models/research/slim")
   ...: from nets.nasnet.nasnet import build_nasnet_large, nasnet_large_arg_scop
   ...: e
   ...: height = 299
   ...: width = 299
   ...: channels = 3
   ...: FREEZE_DIR = "./models/build_freeze_graphs"
   ...: # Create graph
   ...: X = tf.placeholder(tf.float32, shape=[None, height, width, channels])
   ...: with slim.arg_scope(nasnet_large_arg_scope()):
   ...:     logits, end_points = build_nasnet_large(X, num_classes=1001,is_train
   ...: ing=False)
   ...: softmax = end_points["Predictions"]
   ...: saver = tf.train.Saver()
   ...: X_test = np.ones((1,299,299,3))  # a fake image
   ...: 
   ...: # Execute graph
   ...: with tf.Session() as sess:
   ...:     saver.restore(sess, FREEZE_DIR + "/nasnet-a_large_04_10_2017-model.c
   ...: kpt")
   ...:     tf.train.write_graph(sess.graph_def, FREEZE_DIR, 'nasnet-a_large.pbt
   ...: xt')
   ...:     
2018-01-10 23:15:42.496988: I tensorflow/core/platform/s3/aws_logging.cc:53] Initializing Curl library
/usr/lib64/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
INFO:tensorflow:Restoring parameters from ./models/build_freeze_graphs/nasnet-a_large_04_10_2017-model.ckpt

In [2]: 
$ python3 /usr/local/src/tensorflow/tensorflow/tensorflow/python/tools/freeze_graph.py --input_graph ./nasnet-a_large.pbtxt --input_checkpoint ./nasnet-a_large_04_10_2017-model.ckpt --output_node_names final_layer/predictions --output_graph ./frozen_nasnet-a_large.pb
2018-01-11 03:48:11.933436: I tensorflow/core/platform/s3/aws_logging.cc:53] Initializing Curl library
/usr/lib64/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Converted 1546 variables to const ops.

$ ipython3
Python 3.6.3 (default, Oct  9 2017, 12:11:29) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: TF_SRC_DIR = "/usr/local/src/tensorflow"
   ...: TF_TF_SRC_DIR = TF_SRC_DIR + "/tensorflow"
   ...: TF_MODELS_SRC_DIR = TF_SRC_DIR + "/models"
   ...: TF_EXAMPLES_DIR = TF_TF_SRC_DIR + "/tensorflow/examples"
   ...: 

In [2]: import importlib.util
   ...: import sys
   ...: 
   ...: # Load TF retrain module
   ...: spec = importlib.util.spec_from_file_location("retrain", TF_EXAMPLES_DIR
   ...:  + "/image_retraining/retrain.py")
   ...: retrain = importlib.util.module_from_spec(spec)
   ...: spec.loader.exec_module(retrain)
   ...: 
2018-01-11 04:11:22.152580: I tensorflow/core/platform/s3/aws_logging.cc:53] Initializing Curl library
/usr/lib64/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

In [3]: DATA_DIR = "./data/processed"
   ...: IMAGE_DIR = DATA_DIR + "/JPG/Train"
   ...: ARCHITECTURE = "nasnet-a_large"
   ...: 

In [4]: class C:
   ...:     pass
   ...: 
   ...: retrain.FLAGS = C()
   ...: 
   ...: retrain.FLAGS.architecture = ARCHITECTURE
   ...: retrain.FLAGS.bottleneck_dir = DATA_DIR + '/' + ARCHITECTURE + "/bottlen
   ...: ecks"
   ...: retrain.FLAGS.eval_step_interval = 200
   ...: retrain.FLAGS.final_tensor_name = "final_result"
   ...: retrain.FLAGS.flip_left_right = False
   ...: retrain.FLAGS.how_many_training_steps = 20000
   ...: retrain.FLAGS.image_dir = IMAGE_DIR
   ...: retrain.FLAGS.intermediate_output_graphs_dir = DATA_DIR + '/' + ARCHITEC
   ...: TURE + "/intermed_output_graphs"
   ...: retrain.FLAGS.intermediate_store_frequency = 0
   ...: retrain.FLAGS.learning_rate = 0.005
   ...: retrain.FLAGS.model_dir = DATA_DIR + "/models"
   ...: retrain.FLAGS.output_graph = DATA_DIR + '/' + ARCHITECTURE + "/output_gr
   ...: aph.pb"
   ...: retrain.FLAGS.output_labels = DATA_DIR + '/' + ARCHITECTURE + "/output_l
   ...: abels.txt"
   ...: retrain.FLAGS.print_misclassified_test_images = False
   ...: retrain.FLAGS.random_brightness = 0
   ...: retrain.FLAGS.random_crop = 0
   ...: retrain.FLAGS.random_scale = 0
   ...: retrain.FLAGS.summaries_dir = DATA_DIR + '/' + ARCHITECTURE + "/retrain_
   ...: logs"
   ...: retrain.FLAGS.test_batch_size = -1
   ...: retrain.FLAGS.testing_percentage = 10
   ...: retrain.FLAGS.train_batch_size = 100
   ...: retrain.FLAGS.validation_batch_size = 100
   ...: retrain.FLAGS.validation_percentage = 10
   ...: 
   ...: 

In [5]: def create_model_info(architecture):
   ...:     return {
   ...:       'data_url': "localhost:12345/fake/path/frozen_nasnet-a_large.pb",
   ...:       'bottleneck_tensor_name': "final_layer/dropout/Identity:0",
   ...:       'bottleneck_tensor_size': 4032,
   ...:       'input_width': 299,
   ...:       'input_height': 299,
   ...:       'input_depth': 3,
   ...:       'resized_input_tensor_name': "Placeholder:0",
   ...:       'model_file_name': "frozen_nasnet-a_large.pb",
   ...:       'input_mean': 128,
   ...:       'input_std': 128,
   ...:       'quantize_layer': False,
   ...:   }
   ...: 
   ...: retrain.create_model_info = create_model_info
   ...: 

In [6]: retrain.main(None)
Not extracting or downloading files, model already present in disk
Model path:  ./data/processed/models/frozen_nasnet-a_large.pb
INFO:tensorflow:Looking for images in 'Iceberg'
INFO:tensorflow:Looking for images in 'NotIceberg'
INFO:tensorflow:Creating bottleneck at ./data/processed/nasnet-a_large/bottlenecks/Iceberg/1475-e8b76fb7.jpg_nasnet-a_large.txt
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1349     try:
-> 1350       return fn(*args)
   1351     except errors.OpError as e:

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1328                                    feed_dict, fetch_list, target_list,
-> 1329                                    status, run_metadata)
   1330 

~/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,75,75,42] vs. shape[2] = [1,42,75,75]
     [[Node: cell_stem_0/cell_output/concat = _MklConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](cell_stem_0/comb_iter_1/combine/add, cell_stem_0/comb_iter_2/combine/add, cell_stem_0/comb_iter_3/combine/add, cell_stem_0/comb_iter_4/combine/add, cell_17/split/split_dim, DMT/_121, DMT/_122, cell_stem_0/comb_iter_3/combine/add:1, DMT/_123, DMT/_124)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py in create_bottleneck_file(bottleneck_path, image_lists, label_name, index, image_dir, category, sess, jpeg_data_tensor, decoded_image_tensor, resized_input_tensor, bottleneck_tensor)
    381         sess, image_data, jpeg_data_tensor, decoded_image_tensor,
--> 382         resized_input_tensor, bottleneck_tensor)
    383   except Exception as e:

/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py in run_bottleneck_on_image(sess, image_data, image_data_tensor, decoded_image_tensor, resized_input_tensor, bottleneck_tensor)
    316   bottleneck_values = sess.run(bottleneck_tensor,
--> 317                                {resized_input_tensor: resized_input_values})
    318   bottleneck_values = np.squeeze(bottleneck_values)

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1127       results = self._do_run(handle, final_targets, final_fetches,
-> 1128                              feed_dict_tensor, options, run_metadata)
   1129     else:

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1343       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1344                            options, run_metadata)
   1345     else:

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1362           pass
-> 1363       raise type(e)(node_def, op, message)
   1364 

InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,75,75,42] vs. shape[2] = [1,42,75,75]
     [[Node: cell_stem_0/cell_output/concat = _MklConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](cell_stem_0/comb_iter_1/combine/add, cell_stem_0/comb_iter_2/combine/add, cell_stem_0/comb_iter_3/combine/add, cell_stem_0/comb_iter_4/combine/add, cell_17/split/split_dim, DMT/_121, DMT/_122, cell_stem_0/comb_iter_3/combine/add:1, DMT/_123, DMT/_124)]]

Caused by op 'cell_stem_0/cell_output/concat', defined at:
  File "/usr/bin/ipython3", line 11, in <module>
    sys.exit(start_ipython())
  File "/usr/lib/python3.6/site-packages/IPython/__init__.py", line 125, in start_ipython
    return launch_new_instance(argv=argv, **kwargs)
  File "/usr/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/lib/python3.6/site-packages/IPython/terminal/ipapp.py", line 356, in start
    self.shell.mainloop()
  File "/usr/lib/python3.6/site-packages/IPython/terminal/interactiveshell.py", line 480, in mainloop
    self.interact()
  File "/usr/lib/python3.6/site-packages/IPython/terminal/interactiveshell.py", line 471, in interact
    self.run_cell(code, store_history=True)
  File "/usr/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2856, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-c66df78b6d6a>", line 1, in <module>
    retrain.main(None)
  File "/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py", line 1024, in main
    create_model_graph(model_info))
  File "/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py", line 291, in create_model_graph
    model_info['resized_input_tensor_name'],
  File "/home/rick/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "/home/rick/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 548, in import_graph_def
    op_def=op_def)
  File "/home/rick/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/home/rick/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1617, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,75,75,42] vs. shape[2] = [1,42,75,75]
     [[Node: cell_stem_0/cell_output/concat = _MklConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](cell_stem_0/comb_iter_1/combine/add, cell_stem_0/comb_iter_2/combine/add, cell_stem_0/comb_iter_3/combine/add, cell_stem_0/comb_iter_4/combine/add, cell_17/split/split_dim, DMT/_121, DMT/_122, cell_stem_0/comb_iter_3/combine/add:1, DMT/_123, DMT/_124)]]

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-6-c66df78b6d6a> in <module>()
----> 1 retrain.main(None)

/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py in main(_)
   1063                         FLAGS.bottleneck_dir, jpeg_data_tensor,
   1064                         decoded_image_tensor, resized_image_tensor,
-> 1065                         bottleneck_tensor, FLAGS.architecture)
   1066 
   1067     # Add the new layer that we'll be training.

/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py in cache_bottlenecks(sess, image_lists, image_dir, bottleneck_dir, jpeg_data_tensor, decoded_image_tensor, resized_input_tensor, bottleneck_tensor, architecture)
    486             sess, image_lists, label_name, index, image_dir, category,
    487             bottleneck_dir, jpeg_data_tensor, decoded_image_tensor,
--> 488             resized_input_tensor, bottleneck_tensor, architecture)
    489 
    490         how_many_bottlenecks += 1

/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py in get_or_create_bottleneck(sess, image_lists, label_name, index, image_dir, category, bottleneck_dir, jpeg_data_tensor, decoded_image_tensor, resized_input_tensor, bottleneck_tensor, architecture)
    428                            image_dir, category, sess, jpeg_data_tensor,
    429                            decoded_image_tensor, resized_input_tensor,
--> 430                            bottleneck_tensor)
    431   with open(bottleneck_path, 'r') as bottleneck_file:
    432     bottleneck_string = bottleneck_file.read()

/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py in create_bottleneck_file(bottleneck_path, image_lists, label_name, index, image_dir, category, sess, jpeg_data_tensor, decoded_image_tensor, resized_input_tensor, bottleneck_tensor)
    383   except Exception as e:
    384     raise RuntimeError('Error during processing file %s (%s)' % (image_path,
--> 385                                                                  str(e)))
    386   bottleneck_string = ','.join(str(x) for x in bottleneck_values)
    387   with open(bottleneck_path, 'w') as bottleneck_file:

RuntimeError: Error during processing file ./data/processed/JPG/Train/Iceberg/1475-e8b76fb7.jpg (ConcatOp : Dimensions of inputs should match: shape[0] = [1,75,75,42] vs. shape[2] = [1,42,75,75]
     [[Node: cell_stem_0/cell_output/concat = _MklConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](cell_stem_0/comb_iter_1/combine/add, cell_stem_0/comb_iter_2/combine/add, cell_stem_0/comb_iter_3/combine/add, cell_stem_0/comb_iter_4/combine/add, cell_17/split/split_dim, DMT/_121, DMT/_122, cell_stem_0/comb_iter_3/combine/add:1, DMT/_123, DMT/_124)]]

Caused by op 'cell_stem_0/cell_output/concat', defined at:
  File "/usr/bin/ipython3", line 11, in <module>
    sys.exit(start_ipython())
  File "/usr/lib/python3.6/site-packages/IPython/__init__.py", line 125, in start_ipython
    return launch_new_instance(argv=argv, **kwargs)
  File "/usr/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/lib/python3.6/site-packages/IPython/terminal/ipapp.py", line 356, in start
    self.shell.mainloop()
  File "/usr/lib/python3.6/site-packages/IPython/terminal/interactiveshell.py", line 480, in mainloop
    self.interact()
  File "/usr/lib/python3.6/site-packages/IPython/terminal/interactiveshell.py", line 471, in interact
    self.run_cell(code, store_history=True)
  File "/usr/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2856, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-c66df78b6d6a>", line 1, in <module>
    retrain.main(None)
  File "/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py", line 1024, in main
    create_model_graph(model_info))
  File "/usr/local/src/tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py", line 291, in create_model_graph
    model_info['resized_input_tensor_name'],
  File "/home/rick/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "/home/rick/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 548, in import_graph_def
    op_def=op_def)
  File "/home/rick/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/home/rick/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1617, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,75,75,42] vs. shape[2] = [1,42,75,75]
     [[Node: cell_stem_0/cell_output/concat = _MklConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](cell_stem_0/comb_iter_1/combine/add, cell_stem_0/comb_iter_2/combine/add, cell_stem_0/comb_iter_3/combine/add, cell_stem_0/comb_iter_4/combine/add, cell_17/split/split_dim, DMT/_121, DMT/_122, cell_stem_0/comb_iter_3/combine/add:1, DMT/_123, DMT/_124)]]
)
cy89 commented 6 years ago

@zffchen78 I think @rickhg12hs diagnoses of NCHW versus NWHC looks right:

InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,75,75,42] vs. shape[2] = [1,42,75,75]

Should we be able to freeze on the CPU, too?

dwSun commented 6 years ago

I have similar problem when run eval_image_classifier.py on cpu.

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

rickhg12hs commented 4 years ago

We are checking to see if you still need help on this, as this seems to be considerably old issue.

It's been 2 years and I haven't done much with TF since then, but I may in the future. What is the current status of TF model "freeze-ability"? Are pre-trained models fully functional on any platform now?

saberkun commented 4 years ago

The "freeze-ability" now is achieved by savedmodel and keras. https://www.tensorflow.org/api_docs/python/tf/saved_model/load. The lower level api on graph is not good for model distribution and serving.