[tf-serving-r1.7] Failed to compile Tensorflow-serving r1.7 with TensorRT

oscarriddle commented 6 years ago

Have I written custom code : No OS Platform and Distribution: CentOS 7 TensorFlow installed from: source TensorFlow version: tensorflow-serving branch r1.7 Bazel version: 0.11.1 CUDA/cuDNN version: CUDA9.0, cuDNN 7.0.5, TensorRT4.0.4 (actually)

I tried to compile the Tensorflow-serving r1.7 with TensorRT 4.0.4, and the compilation is successfully done.

At global scope:
cc1plus: warning: unrecognized command line option '-Wno-self-assign'
INFO: Elapsed time: 1452.421s, Critical Path: 479.68s
INFO: Build completed successfully, 11375 total actions

But when I start the service and load a TFTRT optimized model, I get error:

2018-06-07 17:41:40.910874: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:242] Loading SavedModel with tags: { serve }; from: /media/disk1/fordata/web_server/project/LdaBasedClassification_623_1.7/data/cate155_tftrt_frozen/1
2018-06-07 17:41:41.030117: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-06-07 17:41:41.283451: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:84:00.0
totalMemory: 11.90GiB freeMemory: 11.74GiB
2018-06-07 17:41:41.283514: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-06-07 17:41:41.601178: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-07 17:41:41.601253: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-06-07 17:41:41.601273: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-06-07 17:41:41.601561: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10970 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:84:00.0, compute capability: 6.1)
2018-06-07 17:41:41.878689: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:291] SavedModel load for tags { serve }; Status: fail. Took 967809 microseconds.
2018-06-07 17:41:41.878771: E tensorflow_serving/util/retrier.cc:38] Loading servable: {name: inception_v3 version: 1} failed: Not found: Op type not registered 'TRTEngineOp' in binary running on bjpg-g180.yz02. Make sure the Op and Kernel are registered in the binary running in this process.

Looks like the TRTEngineOp is still not supported by this execution file. Though I'm not 100% sure about my way of compiling Tensorflow-serving 1.7 with TRT, but I think the compilation indeed searched and found the libnvinfer.so, etc, and also checked the TensorRT version is correct. So I don't know why the binary executive file still can't support TRTEngineOp.

Here is my environment variables:

export TENSORRT_INSTALL_PATH="/home/karafuto/TensorRT-3.0.4/lib"
export TENSORRT_LIB_PATH="/home/karafuto/TensorRT-3.0.4/lib"
export TF_TENSORRT_VERSION=4.0.4

This is my compilation command:

sed -i.bak 's/@org_tensorflow\/\/third_party\/gpus\/crosstool/@local_config_cuda\/\/crosstool:toolchain/g' tools/bazel.rc      
bazel build  --config=cuda --action_env PYTHON_BIN_PATH="/home/karafuto/dlpy72/dlpy/bin/python2.7" TENSORRT_BIN_PATH="/home/karafuto/TensorRT-3.0.4"  -c opt tensorflow_serving/...

I'm not sure whether my procedure is correct. Really few of docs can be found that talk about how to build the tensorflow-serving 1.7 with tensorrt. Can any clue member can help me?

I think I've almost got there!

Thanks,

PS: The tensorrt source code is downloaded from NVIDIA official website, which tar file is named "TensorRT-3.0.4.Ubuntu-14.04.5.x86_64.cuda-9.0.cudnn7.0.tar.gz". The weird thing is, I unpacked the tar file and find the actually version is 4.0.4 not 3.0.4. So in the tensorflow-serving-r1.7, I need to set the variable TF_TENSORRT_VERSION=4.0.4 to avoid version check failure.

I encountered below 2 errors and solved them, so I think the bazel compilation shall indeed compiled the TensorRT. Post here as an evidence.

This is the error when I set wrong TENSORRT_LIB_PATH, (can't find libnvinfer.so):

ERROR: error loading package 'tensorflow_serving/apis': Encountered error while reading extension file 'build_defs.bzl': no such package '@local_config_tensorrt//': Traceback (most recent call last):
    File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 160
        auto_configure_fail("TensorRT library (libnvinfer) v...")
    File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 210, in auto_configure_fail
        fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: TensorRT library (libnvinfer) version is not set.

This is when the TF_TENSORRT_VERSION is not the same with found libnvinfer:

ERROR: error loading package 'tensorflow_serving/apis': Encountered error while reading extension file 'build_defs.bzl': no such package '@local_config_tensorrt//': Traceback (most recent call last):
    File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 167
        _trt_lib_version(repository_ctx, trt_install_path)
    File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 87, in _trt_lib_version
        auto_configure_fail(("TensorRT library version detec...)))
    File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 210, in auto_configure_fail
        fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: TensorRT library version detected from /media/disk1/fordata/web_server/project/xiaolun/TensorRT-3.0.4/include/NvInfer.h (4.0.4) does not match TF_TENSORRT_VERSION (3.0.4). To fix this rerun configure again.

R-Miner commented 6 years ago

@oscarriddle I am looking for a solution to combine the best features of tensorflow serving and tensorRT. Could you please explain how you compiled Tesnorflow Serving with tenorRT?

oscarriddle commented 6 years ago

@R-Miner I did nothing actually special about compiling tensorRT, my mainly modification is focused on compiling other dependencies like python, cuda, etc. As I mentioned above, the compilation automatically check the TensorRT's version but eventually seems not actually compiled it in. It is a little weird.

qiaohaijun commented 6 years ago

https://github.com/tensorflow/serving/blob/master/tensorflow_serving/model_servers/BUILD#L283

add a line

"@org_tensorflow//tensorflow/contrib/tensorrt:trt_engine_op_kernel",

SUPPORTED_TENSORFLOW_OPS = [ 
    "@org_tensorflow//tensorflow/contrib:contrib_kernels",
    "@org_tensorflow//tensorflow/contrib:contrib_ops_op_lib",
    "@org_tensorflow//tensorflow/contrib/tensorrt:trt_engine_op_kernel",
]

then , I find TensorRT in dynamic link so file.

0000000006ec8f60 V _ZTSN10tensorflow8tensorrt10TRTCalibOpE
0000000006ec9160 V _ZTSN10tensorflow8tensorrt11TRTEngineOpE
0000000006ec9320 V _ZTSN10tensorflow8tensorrt17TRTInt8CalibratorE
0000000006ec8fa0 V _ZTSN10tensorflow8tensorrt22TRTCalibrationResourceE

hope useful.

this is a hack way. I think it is better, change BUILD use if_tensorrt

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/tensorrt/BUILD

ydp commented 6 years ago

@qiaohaijun Sorry to bother, I have success build serving with tensorRT, but when I started serving, the is one log:

2018-09-13 15:43:38.647435: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1199] OpKernel ('op: "TRTEngineOp" device_type: "GPU"') for unknown op: TRTEngineOp

how to solve this? and how do I know if tensorRT take effective?

Thanks advance.

elvys-zhang commented 6 years ago

@qiaohaijun I have met the same problem as @ydp . I compiled serving 1.9 with tensorRT 4.0.1.6 , cuda 9, cudnn7 successfully, and find .so like below.

root@A02-R12-I160-19:/serving# ldd serving-trt |grep nv libnvinfer.so.4 => /usr/lib/TensorRT-4.0.1.6/lib/libnvinfer.so.4 (0x00007f06e4bb7000) libnvidia-fatbinaryloader.so.384.81 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.384.81 (0x00007f06c1a01000)

And when I run serving, error occured.

2018-09-18 03:40:31.354473: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1242] OpKernel ('op: "TRTEngineOp" device_type: "GPU"') for unknown op: TRTEngineOp 2018-09-18 03:40:31.354519: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1242] OpKernel ('op: "TRTCalibOp" device_type: "GPU"') for unknown op: TRTCalibOp

Any ideas to solve this? Thx.

elvys-zhang commented 6 years ago

I have fixed my problem by adding lines in external/org_tensorflow/tensorflow/contrib/tensorrt/BUILD as commented. But still have no idea if tensorRT works.


cc_library(
    name = "trt_engine_op_kernel",
    srcs = [
        "kernels/trt_calib_op.cc",
        "kernels/trt_engine_op.cc",
        "ops/trt_calib_op.cc",
        "ops/trt_engine_op.cc",
        "shape_fn/trt_shfn.cc",
        #three lines above
    ],
    hdrs = [
        "kernels/trt_calib_op.h",
        "kernels/trt_engine_op.h",
        "shape_fn/trt_shfn.h",
        #one line above
    ],
    copts = tf_copts(),
    visibility = ["//visibility:public"],
    deps = [
        ":trt_logging",
        ":trt_plugins",
        ":trt_resources",
        "//tensorflow/core:gpu_headers_lib",
        "//tensorflow/core:lib_proto_parsing",
        "//tensorflow/core:stream_executor_headers_lib",
    ] + if_tensorrt([
        "@local_config_tensorrt//:nv_infer",
    ]) + tf_custom_op_library_additional_deps(),
    # TODO(laigd)
    alwayslink = 1,  # buildozer: disable=alwayslink-with-hdrs
)

qiaohaijun commented 6 years ago

so sorry everyone. I find my solution is failed.

export TF_CPP_MIN_VLOG_LEVEL=3

then, I find

2018-09-30 17:23:37.479683: I external/org_tensorflow/tensorflow/core/framework/op.cc:103] 

Not found: Op type not registered 'my_trt_op_0_native_segment' in binary running on xxx.

Make sure the Op and Kernel are registered in the binary running in this process. 

Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) 

`tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

Harshini-Gadige commented 6 years ago

Is this still an issue ?

ydp commented 6 years ago

@harshini-gadige compile with no errors, but unable to know if TensorRT works as expected. according to @qiaohaijun 's comments, compile no errors, but not really success, since it fail at runtime.

Chris19920210 commented 5 years ago

Hi guys, Any progress for it? thanks

Harshini-Gadige commented 5 years ago

@lilao Any inputs please ?

witchoco commented 5 years ago

@elvys-zhang @qiaohaijun @Chris19920210 @ydp @oscarriddle @harshini-gadige @lilao Hi dear all, I also meet this exact same issue like you guys did. Does anyone find a solution for this? Thank you so much if any suggestion. It's driving me crazy~TOT

witchoco commented 5 years ago

@elvys-zhang @qiaohaijun @Chris19920210 @ydp @oscarriddle @harshini-gadige @lilao Hi dear all, I also meet this exact same issue like you guys did. Does anyone find a solution for this? Thank you so much if any suggestion. It's driving me crazy~TOT

BTW, I'm using tf1.12+trt4.0.x, compiling works fine but sort of TRTEngineOp not registered at runtime.

rankeey commented 5 years ago

@qiaohaijun @Chris19920210 @ydp @oscarriddle @lilao
Dear all, does anyone could tell me how to export a int8 savedModel for tf-serving?

aaroey commented 5 years ago

@rankeey Currently exporting int8 saved model is not supported yet, but will be supported once TF 2.0 is out. Please let me know if there are any questions.

netfs commented 5 years ago

also note, upcoming TensorFlow 1.13 release will have official support for TensorRT, and i'd strongly recommend using this release (or the nightly) for any testing rather than older versions of TF Serving.

aaroey commented 5 years ago

Update here: support of exporting int8 saved model is added by https://github.com/tensorflow/tensorflow/commit/fd481c1af898fa5a587d09e9505fcd273bcf18da, see here for how to run the export.

Thanks.

Harshini-Gadige commented 5 years ago

Closing this issue as it is in "awaiting response" status for more than 7 days. If you are facing any new issue, please create a new github request which helps us to address it correctly. If you still want to update here, please post your comments so that we will review and reopen(if required).

tensorflow / serving

[tf-serving-r1.7] Failed to compile Tensorflow-serving r1.7 with TensorRT #925