Closed isra60 closed 5 years ago
Also the log with gdb.
`Thread 1 "python3" received signal SIGSEGV, Segmentation fault. 0x00007fff68d60261 in tensorflow::tensorrt::convert::GetDeviceAndAllocator(tensorflow::tensorrt::convert::ConversionParams const&, tensorflow::tensorrt::convert::EngineInfo const&) () from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so (gdb) bt
from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so
from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so
from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
---Type
`
Same here. You can avoid the segfault by setting force_nms_cpu
to False. It would be helpful to get some information about with which versions of tensorflow, cuda and tensorrt these examples were tested. Is tensorrt 5.x supported?
Thanks! Do you know what this setting does?.
I'm also testing with my jetson tx2 with the last Jetpack which is using also TensorRT 5 and CUDA 10 (Also tensorflow 1.13.1) And there seems that ssd_mobilenet_v2 optimize is working...
I did some more tests:
1 and 2 both had the same hardware configuration, same tensorflow version, same cuda and tensorrt versions. Some change in this repo must have broken things.
mmm. interesting... Have you tried with this Repo? https://github.com/NVIDIA-AI-IOT/tf_trt_models
This is the commit nvidia used in the docker image: d2c28ffb775f8b550541fbde7061caf3daf14375 I just checked it out manually, did the setup and it worked fine
@88madri no, I'm not working with a Jetson platform
minimum_segment_size=50
was changed to minimum_segment_size=2
. This is causing the segmentation fault
yeah I think is this commit right? https://github.com/tensorflow/tensorrt/commit/950811e386a6b82da5609eb045ddc7260bef062e
Is this issue resolved? Looks like 19.03 has worked for you. Which container didn't work?
I think the new default arguments work with TF 1.13. Please let me know if they don't.
No. The new default arguments don’t work with 1.13 on 1060 gtx and also on a jetson tx2. With minimum segment size of 50 it works
I just verified that this is working now with the most recent code in the master branch, and TF 1.14.
Here is the tail of the log:
Loading and preparing results...
DONE (t=0.22s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=8.83s).
Accumulating evaluation results...
DONE (t=1.49s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.248
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.371
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.274
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.172
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.569
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.222
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.277
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.278
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.031
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.197
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.641
{
"avg_latency_ms": 10.634882227359757,
"avg_throughput_fps": 94.03019033227812,
"map": 0.24784700263781276
}
ASSERTION PASSED: statistics['map'] > (0.247 - 0.005)
PASS ssd_mobilenet_v2_coco_trt_fp16.json
DONE testing ssd_mobilenet_v2_coco_trt_fp16.json
This is the config I used:
{
"source_model": {
"model_name": "ssd_mobilenet_v2_coco",
"input_dir": "/data/tensorflow/object_detection/models"
},
"optimization_config": {
"use_trt": true,
"precision_mode": "FP16",
"override_nms_score_threshold": 0.3,
"max_batch_size": 1
},
"benchmark_config": {
"images_dir": "/data/coco/coco-2017/coco2017/val2017",
"annotation_path": "/data/coco/coco-2017/coco2017/annotations/instances_val2017.json",
"batch_size": 1,
"image_shape": [640, 640],
"num_images": 4096,
"output_path": "stats/ssd_mobilenet_v2_coco_trt_fp16.json"
},
"assertions": [
"statistics['map'] > (0.247 - 0.005)"
]
}
Closing. Please reopen in case the issue remains.
Currently I'm trying with ssd_mobilenet_v2_coco with an NVIDIA 1060GTX.
I have tensorflow-gpu v1.13., CUDA10. TensorRT 5. I've downloaded the model with
config_path, checkpoint_path = download_model('ssd_mobilenet_v2_coco', output_dir='models')
I'm trying to optimize the model. with
But always provokes a segmentation fault.. this is the log console.
2019-03-20 11:26:09.152490: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-03-20 11:26:09.235499: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-20 11:26:09.235934: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x45ea9f0 executing computations on platform CUDA. Devices: 2019-03-20 11:26:09.235950: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1 2019-03-20 11:26:09.257165: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz 2019-03-20 11:26:09.257710: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3d6e750 executing computations on platform Host. Devices: 2019-03-20 11:26:09.257725: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-03-20 11:26:09.257948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845 pciBusID: 0000:01:00.0 totalMemory: 5.93GiB freeMemory: 5.56GiB 2019-03-20 11:26:09.257964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:09.334021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:09.334056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:09.334062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:09.334197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:327: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:356: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step 2019-03-20 11:26:15.423701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:15.423744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:15.423750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:15.423753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:15.423886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from models/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt 2019-03-20 11:26:18.529755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:18.529812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:18.529820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:18.529825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:18.529932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) INFO:tensorflow:Restoring parameters from models/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:96: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.convert_variables_to_constants WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.extract_sub_graph INFO:tensorflow:Froze 344 variables. INFO:tensorflow:Converted 344 variables to const ops. 2019-03-20 11:26:19.787213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:19.787255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:19.787261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:19.787264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:19.787369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:288: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:288: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. INFO:tensorflow:No assets to save. INFO:tensorflow:No assets to save. INFO:tensorflow:No assets to write. INFO:tensorflow:No assets to write. INFO:tensorflow:SavedModel written to: .optimize_model_tmp_dir/saved_model/saved_model.pb INFO:tensorflow:SavedModel written to: .optimize_model_tmp_dir/saved_model/saved_model.pb INFO:tensorflow:Writing pipeline config file to .optimize_model_tmp_dir/pipeline.config INFO:tensorflow:Writing pipeline config file to .optimize_model_tmp_dir/pipeline.config 2019-03-20 11:26:21.916570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:21.916607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:21.916613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:21.916617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:21.916717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) INFO:tensorflow:Running against TensorRT version 5.0.2 INFO:tensorflow:Running against TensorRT version 5.0.2 2019-03-20 11:26:23.734739: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1 2019-03-20 11:26:23.735758: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session 2019-03-20 11:26:23.738573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:23.738598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:23.738603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:23.738607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:23.738711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) 2019-03-20 11:26:24.093794: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.117771: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.219573: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.219672: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.435736: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:25.790160: I tensorflow/contrib/tensorrt/segment/segment.cc:443] There are 2317 ops of 33 different types in the graph that are not converted to TensorRT: Fill, Switch, TopKV2, ConcatV2, Identity, Squeeze, Const, Unpack, ResizeBilinear, Reshape, Mul, Slice, Merge, Split, NonMaxSuppressionV3, GatherV2, Range, Conv2D, Cast, Greater, Minimum, Sub, StridedSlice, NoOp, ZerosLike, Pack, Transpose, ExpandDims, Where, Exp, Placeholder, Add, Shape, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops). 2019-03-20 11:26:26.231074: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:913] Number of TensorRT candidate segments: 187 2019-03-20 11:26:35.074128: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 224 nodes succeeded. 2019-03-20 11:26:35.074828: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:1021] TensorRT node BoxPredictor_1/ClassPredictor/TRTEngineOp_1 added for segment 1 consisting of 2 nodes failed: Internal: Segment has no inputs (possible constfold failure). Fallback to TF... Segmentation fault (core dumped)