tensorflow / tensorrt

TensorFlow/TensorRT integration
Apache License 2.0
736 stars 225 forks source link

Segmentation fault when optimize model. #45

Closed isra60 closed 5 years ago

isra60 commented 5 years ago

Currently I'm trying with ssd_mobilenet_v2_coco with an NVIDIA 1060GTX.

I have tensorflow-gpu v1.13., CUDA10. TensorRT 5. I've downloaded the model with

config_path, checkpoint_path = download_model('ssd_mobilenet_v2_coco', output_dir='models')

I'm trying to optimize the model. with


frozen_graph = optimize_model(
    config_path=config_path, 
    checkpoint_path=checkpoint_path,
    use_trt=True,
    precision_mode='FP16'
)

But always provokes a segmentation fault.. this is the log console.

2019-03-20 11:26:09.152490: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-03-20 11:26:09.235499: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-20 11:26:09.235934: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x45ea9f0 executing computations on platform CUDA. Devices: 2019-03-20 11:26:09.235950: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1 2019-03-20 11:26:09.257165: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz 2019-03-20 11:26:09.257710: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3d6e750 executing computations on platform Host. Devices: 2019-03-20 11:26:09.257725: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-03-20 11:26:09.257948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845 pciBusID: 0000:01:00.0 totalMemory: 5.93GiB freeMemory: 5.56GiB 2019-03-20 11:26:09.257964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:09.334021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:09.334056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:09.334062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:09.334197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:327: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:356: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step 2019-03-20 11:26:15.423701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:15.423744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:15.423750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:15.423753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:15.423886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from models/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt 2019-03-20 11:26:18.529755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:18.529812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:18.529820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:18.529825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:18.529932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) INFO:tensorflow:Restoring parameters from models/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:96: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.convert_variables_to_constants WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.extract_sub_graph INFO:tensorflow:Froze 344 variables. INFO:tensorflow:Converted 344 variables to const ops. 2019-03-20 11:26:19.787213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:19.787255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:19.787261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:19.787264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:19.787369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:288: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:288: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. INFO:tensorflow:No assets to save. INFO:tensorflow:No assets to save. INFO:tensorflow:No assets to write. INFO:tensorflow:No assets to write. INFO:tensorflow:SavedModel written to: .optimize_model_tmp_dir/saved_model/saved_model.pb INFO:tensorflow:SavedModel written to: .optimize_model_tmp_dir/saved_model/saved_model.pb INFO:tensorflow:Writing pipeline config file to .optimize_model_tmp_dir/pipeline.config INFO:tensorflow:Writing pipeline config file to .optimize_model_tmp_dir/pipeline.config 2019-03-20 11:26:21.916570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:21.916607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:21.916613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:21.916617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:21.916717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) INFO:tensorflow:Running against TensorRT version 5.0.2 INFO:tensorflow:Running against TensorRT version 5.0.2 2019-03-20 11:26:23.734739: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1 2019-03-20 11:26:23.735758: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session 2019-03-20 11:26:23.738573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:23.738598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:23.738603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:23.738607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:23.738711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) 2019-03-20 11:26:24.093794: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.117771: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.219573: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.219672: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.435736: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:25.790160: I tensorflow/contrib/tensorrt/segment/segment.cc:443] There are 2317 ops of 33 different types in the graph that are not converted to TensorRT: Fill, Switch, TopKV2, ConcatV2, Identity, Squeeze, Const, Unpack, ResizeBilinear, Reshape, Mul, Slice, Merge, Split, NonMaxSuppressionV3, GatherV2, Range, Conv2D, Cast, Greater, Minimum, Sub, StridedSlice, NoOp, ZerosLike, Pack, Transpose, ExpandDims, Where, Exp, Placeholder, Add, Shape, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops). 2019-03-20 11:26:26.231074: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:913] Number of TensorRT candidate segments: 187 2019-03-20 11:26:35.074128: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 224 nodes succeeded. 2019-03-20 11:26:35.074828: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:1021] TensorRT node BoxPredictor_1/ClassPredictor/TRTEngineOp_1 added for segment 1 consisting of 2 nodes failed: Internal: Segment has no inputs (possible constfold failure). Fallback to TF... Segmentation fault (core dumped)

isra60 commented 5 years ago

Also the log with gdb.

`Thread 1 "python3" received signal SIGSEGV, Segmentation fault. 0x00007fff68d60261 in tensorflow::tensorrt::convert::GetDeviceAndAllocator(tensorflow::tensorrt::convert::ConversionParams const&, tensorflow::tensorrt::convert::EngineInfo const&) () from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so (gdb) bt

0 0x00007fff68d60261 in tensorflow::tensorrt::convert::GetDeviceAndAllocator(tensorflow::tensorrt::convert::ConversionParams const&, tensorflow::tensorrt::convert::EngineInfo const&) ()

from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so

1 0x00007fff68d651aa in tensorflow::tensorrt::convert::ConvertAfterShapes(tensorflow::tensorrt::convert::ConversionParams&) () from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so

2 0x00007fff68d90f56 in tensorflow::tensorrt::convert::TRTOptimizationPass::Optimize(tensorflow::grappler::Cluster, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef) ()

from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so

3 0x00007fffb549a8ee in tensorflow::grappler::MetaOptimizer::RunOptimizer(tensorflow::grappler::GraphOptimizer, tensorflow::grappler::Cluster, tensorflow::grappler::GrapplerItem, tensorflow::GraphDef, tensorflow::grappler::MetaOptimizer::GraphOptimizationResult*) ()

from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

4 0x00007fffb549b552 in tensorflow::grappler::MetaOptimizer::OptimizeGraph(tensorflow::grappler::Cluster, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef) ()

from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

5 0x00007fffb549c8a7 in tensorflow::grappler::MetaOptimizer::Optimize(tensorflow::grappler::Cluster, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef) ()

from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

6 0x00007fffb028ab9c in TF_OptimizeGraph(GCluster, tensorflow::ConfigProto const&, tensorflow::MetaGraphDef const&, bool, std::string const&, TF_Status*) ()

from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

7 0x00007fffb0293157 in _wrap_TF_OptimizeGraph ()

from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

8 0x0000000000502d6f in ?? ()

9 0x0000000000506859 in _PyEval_EvalFrameDefault ()

10 0x0000000000504c28 in ?? ()

11 0x0000000000502540 in ?? ()

12 0x0000000000502f3d in ?? ()

13 0x0000000000507641 in _PyEval_EvalFrameDefault ()

14 0x0000000000504c28 in ?? ()

15 0x0000000000502540 in ?? ()

16 0x0000000000502f3d in ?? ()

17 0x0000000000507641 in _PyEval_EvalFrameDefault ()

18 0x0000000000504c28 in ?? ()

---Type to continue, or q to quit---

19 0x0000000000502540 in ?? ()

20 0x0000000000502f3d in ?? ()

21 0x0000000000507641 in _PyEval_EvalFrameDefault ()

22 0x0000000000504c28 in ?? ()

23 0x0000000000506393 in PyEval_EvalCode ()

24 0x0000000000634d52 in ?? ()

25 0x00000000004a38c5 in ?? ()

26 0x00000000004a5cd5 in PyRun_InteractiveLoopFlags ()

27 0x00000000006387b3 in PyRun_AnyFileExFlags ()

28 0x000000000063915a in Py_Main ()

29 0x00000000004a6f10 in main ()

`

MirkoArnold1 commented 5 years ago

Same here. You can avoid the segfault by setting force_nms_cpu to False. It would be helpful to get some information about with which versions of tensorflow, cuda and tensorrt these examples were tested. Is tensorrt 5.x supported?

isra60 commented 5 years ago

Thanks! Do you know what this setting does?.

I'm also testing with my jetson tx2 with the last Jetpack which is using also TensorRT 5 and CUDA 10 (Also tensorflow 1.13.1) And there seems that ssd_mobilenet_v2 optimize is working...

MirkoArnold1 commented 5 years ago

I did some more tests:

  1. Using the official nvidia tensorflow docker image 19.03-py3, I tried the object detection example with the version of this repo that is included in the docker image --> It worked fine, no errors
  2. From the same docker image (but a different container), I cloned this repo and tried the object detection example --> Segmentation Fault

1 and 2 both had the same hardware configuration, same tensorflow version, same cuda and tensorrt versions. Some change in this repo must have broken things.

isra60 commented 5 years ago

mmm. interesting... Have you tried with this Repo? https://github.com/NVIDIA-AI-IOT/tf_trt_models

MirkoArnold1 commented 5 years ago

This is the commit nvidia used in the docker image: d2c28ffb775f8b550541fbde7061caf3daf14375 I just checked it out manually, did the setup and it worked fine

MirkoArnold1 commented 5 years ago

@88madri no, I'm not working with a Jetson platform

MirkoArnold1 commented 5 years ago

minimum_segment_size=50 was changed to minimum_segment_size=2. This is causing the segmentation fault

isra60 commented 5 years ago

yeah I think is this commit right? https://github.com/tensorflow/tensorrt/commit/950811e386a6b82da5609eb045ddc7260bef062e

pooyadavoodi commented 5 years ago

Is this issue resolved? Looks like 19.03 has worked for you. Which container didn't work?

I think the new default arguments work with TF 1.13. Please let me know if they don't.

isra60 commented 5 years ago

No. The new default arguments don’t work with 1.13 on 1060 gtx and also on a jetson tx2. With minimum segment size of 50 it works

pooyadavoodi commented 5 years ago

I just verified that this is working now with the most recent code in the master branch, and TF 1.14.

Here is the tail of the log:

Loading and preparing results...
DONE (t=0.22s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=8.83s).
Accumulating evaluation results...
DONE (t=1.49s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.248
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.274
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.172
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.569
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.222
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.277
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.278
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.031
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.197
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.641
{
    "avg_latency_ms": 10.634882227359757,
    "avg_throughput_fps": 94.03019033227812,
    "map": 0.24784700263781276
}
ASSERTION PASSED: statistics['map'] > (0.247 - 0.005)
PASS ssd_mobilenet_v2_coco_trt_fp16.json
DONE testing ssd_mobilenet_v2_coco_trt_fp16.json

This is the config I used:

{
  "source_model": {
    "model_name": "ssd_mobilenet_v2_coco",
    "input_dir": "/data/tensorflow/object_detection/models"
  },
  "optimization_config": {
    "use_trt": true,
    "precision_mode": "FP16",
    "override_nms_score_threshold": 0.3,
    "max_batch_size": 1
  },
  "benchmark_config": {
    "images_dir": "/data/coco/coco-2017/coco2017/val2017",
    "annotation_path": "/data/coco/coco-2017/coco2017/annotations/instances_val2017.json",
    "batch_size": 1,
    "image_shape": [640, 640],
    "num_images": 4096,
    "output_path": "stats/ssd_mobilenet_v2_coco_trt_fp16.json"
  },
  "assertions": [
    "statistics['map'] > (0.247 - 0.005)"
  ]
}
pooyadavoodi commented 5 years ago

Closing. Please reopen in case the issue remains.