nfbalbontin commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[ Yes] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[ Yes] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[ Yes] I checked to make sure that this issue has not been filed already.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/inference/infer_detections.py

2. Describe the bug

I've been trying to create an inference detection graph from a frozen graph that I already generated. But each time I run the module infer_detections.py, I get the following error: ValueError: Input 1 of node StatefulPartitionedCall was passed float from stem_conv2d/kernel:0 incompatible with expected resource.

3. Steps to reproduce

For creating the frozen_graph.pb I run the following steps:

1. I obtain the output_node_names:

model_filename ='<path-to-saved-model>'
output_node_names = ""
with gfile.FastGFile(model_filename, 'rb') as f:
    data = compat.as_bytes(f.read())
    sm = saved_model_pb2.SavedModel()
    sm.ParseFromString(data)
    graph = sm.meta_graphs[0].graph_def.node
    i = 0 
    for y in graph: 
        if i == 0: 
            output_node_names = y.name
        else: 
            output_node_names += f",{y.name}"
        i += 1

2. I generate the frozen_graph.pb:

freze_graph = freeze_graph.freeze_graph(
    input_graph=None,
    input_saver=None,
    input_binary=None,
    input_checkpoint=None,
    output_node_names=output_node_names,
    restore_op_name=None,
    filename_tensor_name=None,
    output_graph=os.path.join("<output-saving-path>", "frozen_graph.pb"),
    clear_devices=None,
    initializer_nodes=None,
    input_saved_model_dir="<path-to-directory-of-saved_model.pb>",
    saved_model_tags=tag_constants.SERVING
)

3. I run the infer_detections.py module:

echo "===RUNING INFER=="
python source_dir/infer_detections.py \
    --input_tfrecord_paths=<path-to-train.tfrecords>,<path-to-validation.tfrecords>  \
    --output_tfrecord_path=<path-to-infer_detections.records> \
    --inference_graph=files<path-to-frozen_graph.pb>

4. Expected behavior

Obtain the inference detection graph

5. Additional context

===RUNING INFER==
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/__init__.py:1473: The name tf.estimator.inputs is deprecated. Please use tf.compat.v1.estimator.inputs instead.

2021-03-19 16:22:56.642733: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400060000 Hz
2021-03-19 16:22:56.642925: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b419785820 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-19 16:22:56.642948: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-03-19 16:22:56.643179: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Reading input from 2 files
I0319 16:22:56.643431 140673042499392 infer_detections.py:68] Reading input from 2 files
['files/tfrecords/train.records', 'files/tfrecords/validation.records']
WARNING:tensorflow:From /home/ec2-user/SageMaker/amazon-sagemaker-tensorflow-object-detection-api/3_predict/source_dir/object_detection/inference/detection_inference.py:36: string_input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
W0319 16:22:56.644418 140673042499392 deprecation.py:323] From /home/ec2-user/SageMaker/amazon-sagemaker-tensorflow-object-detection-api/3_predict/source_dir/object_detection/inference/detection_inference.py:36: string_input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:277: input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
W0319 16:22:56.651090 140673042499392 deprecation.py:323] From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:277: input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:189: limit_epochs (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensors(tensor).repeat(num_epochs)`.
W0319 16:22:56.651372 140673042499392 deprecation.py:323] From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:189: limit_epochs (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensors(tensor).repeat(num_epochs)`.
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:112: RefVariable.count_up_to (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Dataset.range instead.
W0319 16:22:56.654751 140673042499392 deprecation.py:323] From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:112: RefVariable.count_up_to (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Dataset.range instead.
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/ops/variables.py:2522: count_up_to (from tensorflow.python.ops.state_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Dataset.range instead.
W0319 16:22:56.654909 140673042499392 deprecation.py:323] From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/ops/variables.py:2522: count_up_to (from tensorflow.python.ops.state_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Dataset.range instead.
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:198: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
W0319 16:22:56.657660 140673042499392 deprecation.py:323] From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:198: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:198: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
W0319 16:22:56.659007 140673042499392 deprecation.py:323] From /home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/training/input.py:198: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From /home/ec2-user/SageMaker/amazon-sagemaker-tensorflow-object-detection-api/3_predict/source_dir/object_detection/inference/detection_inference.py:38: TFRecordReader.__init__ (from tensorflow.python.ops.io_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.TFRecordDataset`.
W0319 16:22:56.663251 140673042499392 deprecation.py:323] From /home/ec2-user/SageMaker/amazon-sagemaker-tensorflow-object-detection-api/3_predict/source_dir/object_detection/inference/detection_inference.py:38: TFRecordReader.__init__ (from tensorflow.python.ops.io_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.TFRecordDataset`.
INFO:tensorflow:Reading graph and building model...
I0319 16:22:56.715275 140673042499392 infer_detections.py:73] Reading graph and building model...
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal
    graph._c_graph, serialized, options)  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input 1 of node StatefulPartitionedCall was passed float from stem_conv2d/kernel:0 incompatible with expected resource.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "source_dir/infer_detections.py", line 98, in <module>
    tf.app.run()
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "source_dir/infer_detections.py", line 76, in main
    image_tensor, FLAGS.inference_graph)
  File "/home/ec2-user/SageMaker/amazon-sagemaker-tensorflow-object-detection-api/3_predict/source_dir/object_detection/inference/detection_inference.py", line 77, in build_inference_graph
    graph_def, name='', input_map={'image_tensor': image_tensor})
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
    producer_op_list=producer_op_list)
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/cpu/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 505, in _import_graph_def_internal
    raise ValueError(str(e))
ValueError: Input 1 of node StatefulPartitionedCall was passed float from stem_conv2d/kernel:0 incompatible with expected resource.

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Amazon Linux AMI
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 2.4.0
Python version: 3.6
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): GCC 5.8.5
CUDA/cuDNN version: 10.0.130
GPU model and memory: Cirrus Logic GD 5446 32M

chandyalex commented 3 years ago

@nfbalbontin One of the reasons could be that the model you are trying to is bigger and can't handle by GPU. Have you checked the GPU utilization?. Also, make sure that you are exporting the model library path in your workspace. A similar problem can be found here #1152

nfbalbontin commented 3 years ago

Hi! I am exporting the library to my actual workspace. I checked if the GPU is available with with:

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Giving me back:

Num GPUs Available:  1

After that, when running the progress, I simultaneously checked the GPU with nvidia-smi -l 1, which showed me:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   43C    P0    41W / 300W |    309MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     11107      C   python                            307MiB |
+-----------------------------------------------------------------------------+

So apparently it isn't the problem.

chandyalex commented 3 years ago

@nfbalbontin Intresting

Can you add this line to your code before loading model and see what happens.

tf.keras.backend.set_learning_phase(0)

nfbalbontin commented 3 years ago

Thanks again for the reply. I added the line before generating the frozen graph - I imagined that's what you meant by "before loading the model"-. Still, I get the same error as before. This is the line that I added before the first step showed above.

tf.keras.backend.set_learning_phase(0)

/home/ec2-user/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow/python/keras/backend.py:435: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '

tensorflow / models

Executing infer_detections.py [ValueError: Input 1 of node StatefulPartitionedCall was passed float from stem_conv2d/kernel:0 incompatible with expected resource] #9816

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information