Open mankeyboy opened 4 years ago
This is the exact same issue I am seeing here https://github.com/tensorflow/tensorflow/issues/33184#issuecomment-577881589
Looks like this is more widespread than just me. Hopefully this means it will get more attention.
I met similar error. My test env is "tf2.0-trt7.0"
Benchmark arguments:
annotation_path: None
batch_size: 1
calib_data_dir: None
data_dir: .
display_every: 100
gpu_mem_cap: 0
input_saved_model_dir: /home/suhyung/work/git/tf_trt_models/examples/detection/data/faster_rcnn_resnet50_coco_2018_01_28/saved_model/
input_size: 640
max_workspace_size: 1073741824
minimum_segment_size: 2
mode: benchmark
num_calib_inputs: 500
num_iterations: 100
num_warmup_iterations: 50
optimize_offline: False
output_saved_model_dir: trt_engine
precision: FP16
target_duration: None
use_synthetic: True
use_trt: True
TensorRT Conversion Params:
is_dynamic_op: True
max_batch_size: 1
max_workspace_size_bytes: 1073741824
maximum_cached_engines: 1
minimum_segment_size: 2
precision_mode: FP16
rewriter_config_template: None
use_calibration: False
Conversion times:
conversion: 49.2s
Traceback (most recent call last):
File "object_detection.py", line 432, in
I've been able to clear a few of the above errors and now I'm able to get it working for even batch sizes using the models from the r1.14+ branch of the code. However, the output I'm getting doesn't give the correct accuracy and the logs tell that because the saved_model.pb in the model Eg:
'ssd_inception_v2_coco':
Model(
'ssd_inception_v2_coco',
'http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz',
'ssd_inception_v2_coco_2018_01_28',
)
doesn't have variables saved in the variable folder and so I'm basically running an untrained graph.
This function controls how the saved model is loaded onto the graph. The pretrained model has a checkpoint file and and a frozen_inference_graph
but TensorRT takes only SavedModel in TF2.x so the only way is to load the checkpoint file or frozen_inference_graph and convert it into a SavedModel.
First, I tried this modification to the function to get to a SavedModel from the checkpoint:
with tf.compat.v1.Session() as sess:
new_saver = tf.compat.v1.train.import_meta_graph(saved_model_dir+'/model.ckpt.meta')
new_saver.restore(sess, tf.train.latest_checkpoint(saved_model_dir+'/'))
graph_func = tf.compat.v1.graph_util.convert_variables_to_constants(
sess,
tf.compat.v1.get_default_graph().as_graph_def(),
output_node_names=['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'])
tf.compat.v1.saved_model.simple_save(sess, saved_model_dir+'/test',
inputs = {'image_tensor': image_tensor}
outputs={'detection_boxes': detection_boxes, 'detection_classes': detection_classes, 'detection_scores': detection_scores, 'num_detections': num_detections})
The code fails on this call because of errors in the placeholders and input_names and stackoverflow answers say that I need to have access to the original function that created this checkpoint to convert it.
Hence, the next approach, converting from frozen_inference_graph.pb:
INPUT_NAME = 'image_tensor'
BOXES_NAME = 'detection_boxes'
CLASSES_NAME = 'detection_classes'
SCORES_NAME = 'detection_scores'
NUM_DETECTIONS_NAME = 'num_detections'
FROZEN_GRAPH_NAME = 'frozen_inference_graph.pb'
def get_func_from_saved_model(saved_model_dir):
builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(saved_model_dir+'/test')
frozen_graph_path = os.path.join(saved_model_dir, FROZEN_GRAPH_NAME)
print(frozen_graph_path)
graph_func = tf.compat.v1.GraphDef()
with open(frozen_graph_path, 'rb') as f:
graph_func.ParseFromString(f.read())
sigs = {}
with tf.compat.v1.Session(graph=tf.compat.v1.Graph()) as sess:
# name="" is important to ensure we don't get spurious prefixing
tf.compat.v1.import_graph_def(graph_func, name="")
tf_graph = tf.compat.v1.get_default_graph()
tf_input = tf_graph.get_tensor_by_name(INPUT_NAME+':0')
tf_boxes = tf_graph.get_tensor_by_name(BOXES_NAME + ':0')
tf_classes = tf_graph.get_tensor_by_name(CLASSES_NAME + ':0')
tf_scores = tf_graph.get_tensor_by_name(SCORES_NAME + ':0')
tf_num_detections = tf_graph.get_tensor_by_name(NUM_DETECTIONS_NAME + ':0')
sigs[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = \
tf.compat.v1.saved_model.signature_def_utils.predict_signature_def(
{INPUT_NAME: tf_input}, {BOXES_NAME: tf_boxes, CLASSES_NAME: tf_classes, SCORES_NAME: tf_scores, NUM_DETECTIONS_NAME: tf_num_detections})
builder.add_meta_graph_and_variables(sess,
[tag_constants.SERVING],
signature_def_map=sigs)
builder.save()
saved_model_loaded = tf.saved_model.load(
saved_model_dir+'/test', tags=[tag_constants.SERVING])
graph_func = saved_model_loaded.signatures[
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
return graph_func
This works but it doesn't create any variables folder and so the saved_model is completely untrained which doesn't serve the purpose. I'm getting throughput numbers but my mAP value is showing that this is an untrained graph run.
My run call is: python object_detection.py --input_saved_model_dir models/ssd_inception_v2_coco_2018_01_28 --output_saved_model_dir trt_engine --data_dir coco/val2017 --annotation_path coco/annotations/instances_val2017.json --input_size 640 --batch_size 8 --num_warmup_iterations 10 --minimum_segment_size 3 --num_iterations 50 --use_trt --precision FP16
@pooyadavoodi @vinhngx @aaroey Any tips? I'm looking for a way to get the trained model loaded properly like it was for r1.14+
@tfeher could you help to take a look at this? Also @bixia1
I updated https://github.com/tensorflow/tensorflow/issues/36724 with new comments for the bug I have raised there.
I am having different issue for the command : python object_detection.py --input_saved_model_dir $HOME/trt/obj_models/ssd_mobilenet_v2_coco_2018_03_29/saved_model/ --output_saved_model_dir $HOME/trt/obj_out_dir --optimize_offline --data_dir $HOME/trt/coco_data/val2017 --annotation_path $HOME/trt/coco_data/annotations/instances_val2017.json --batch_size 1 --use_trt --mode benchmark --precision FP32 --input_size 640 CUDA 10.2 CUDNN 7.6.5 TRT - 7 TF : master (after 2.1.0)
*Traceback (most recent call last):
File "object_detection.py", line 410, in
@aaroey : is this known issue? I am facing the segmentation issue even in latest Tensorflow container from NGC 20.01-tf1-py3
@vdevaram You can solve your issue by providing this argument --minimum_segment_size 3
when you make your run. I have already opened a bug related to this at tensorflow. The default segment size used by TensorRT for optimisations is 3 and in the code, we are trying to use 2, which even though suboptimal according to recommendations shouldn't fail. Discussion over this is ongoing on the other issue :)
minimum_segment_size=3
or larger should help to get around the conversion problem. Now I moved to nvcr.io/nvidia/tensorflow:20.02-tf2-py3 and tried with TF object detection models. Here the result for Frcnn. Although it is working, I am seeing lot of latency variation with thermal rise upto 85C. is there any other problem?
cmd : python object_detection.py --input_saved_model_dir /local/obj_models/faster_rcnn_resnet50_coco_2018_01_28/saved_model/ --output_saved_model_dir /local/obj_out_dir --optimize_offline --data_dir /local/coco_data/val2017 --annotation_path /local/coco_data/annotations/instances_val2017.json --batch_size 1 --use_trt --mode benchmark --precision FP32 --input_size 600 --minimum_segment_size 3
benchmark result : _step 101/2048, iter_time(ms)=86 step 201/2048, iter_time(ms)=93 step 301/2048, iter_time(ms)=90 step 401/2048, iter_time(ms)=89 step 501/2048, iter_time(ms)=90 step 601/2048, iter_time(ms)=85 step 701/2048, iter_time(ms)=91 step 801/2048, iter_time(ms)=87 step 901/2048, iter_time(ms)=91 step 1001/2048, iter_time(ms)=92 step 1101/2048, iter_time(ms)=87 step 1201/2048, iter_time(ms)=89 step 1301/2048, iter_time(ms)=100 step 1401/2048, iter_time(ms)=106 step 1501/2048, iter_time(ms)=125 step 1601/2048, iter_time(ms)=204 step 1701/2048, iter_time(ms)=111 step 1801/2048, iter_time(ms)=118 step 1901/2048, iter_time(ms)=108 step 2001/2048, iter_time(ms)=255 Results: images/sec: 9 99th_percentile(ms): 378.12 total_time(s): 225.9 latency_mean(ms): 115.90 latency_median(ms): 92.96 latencymin(ms): 77.35
I'm creating this issue to help collect the issues in the Object Detection example:
To start, I have followed the steps and setup the dependencies. Now, attempting to run a synthetic test :
Gives this error:
On attempting to run a validation test:
This error is observed: