tensorflow / tensorrt

TensorFlow/TensorRT integration
Apache License 2.0
737 stars 226 forks source link

How to serialize your tf2 model to .plan #166

Open rvorias opened 4 years ago

rvorias commented 4 years ago

Currently, the examples show how to go from a SavedModel to a graph function (get_func_from_saved_model) and do basic inference. However, for production environments, I could image that you'd want to host your tfTrt model in the TensorRT c++ environment. For this, the TensorRT documentation gives the following advice:

Which function can we use in TF2 to give the correct graphdef to generate a tensorRtplan?

I also tried the uff way, but the convert-to-uff in the nvidia tensorRT container 19.11 is not able to convert models to uff due to not being able to import the correct tf files.

rvorias commented 4 years ago

The trick was to copy an updated convert_to_constants.py in the docker container and use convert_variables_to_constants_v2_as_graph. I've written down some instruction on the Nvidia Dev Forums

Update: the created plan files are empty.

heizie commented 4 years ago

The trick was to copy an updated convert_to_constants.py in the docker container and use convert_variables_to_constants_v2_as_graph. I've written down some instruction on the Nvidia Dev Forums

Update: the created plan files are empty.

Hi, I've also a same problems. I've saw a lot of tutorial. mainly it needs to install tensorrt outside, and then convert to the plan file WITH UFF...which is already not used! I don't why this things is such a chaos.

rvorias commented 4 years ago

Hi @heizie,

I just replied to a fellow developer on the NVidia forums.

Forget the .plan serialization. Right now, our (only succesful) workflow consists of converting a tf2 model to ONNX and then parsing it with TensorRT in C++. It more or less goes like this.

  1. Make your model in TF2
  2. Create a concrete function (typically you call your model with some spec input) a la
    concrete_run = model.call.get_concrete_function(
        inputs = tf.TensorSpec(
            CONCRETE_INPUT_SHAPE,
            dtype=DTYPE
        )
  3. Convert the concrete function to ONNX using tf2onnx, typically with:
    !python -m tf2onnx.convert
        –opset 11
        –fold_const
        –saved-model …/saved_models/model/
        –output …/saved_models/model-FP32/model.onnx
  1. Parse the ONNX model into C++
heizie commented 4 years ago

Hi @heizie,

I just replied to a fellow developer on the NVidia forums.

Forget the .plan serialization. Right now, our (only succesful) workflow consists of converting a tf2 model to ONNX and then parsing it with TensorRT in C++. It more or less goes like this.

  1. Make your model in TF2
  2. Create a concrete function (typically you call your model with some spec input) a la
    concrete_run = model.call.get_concrete_function(
        inputs = tf.TensorSpec(
            CONCRETE_INPUT_SHAPE,
            dtype=DTYPE
        )
  1. Convert the concrete function to ONNX using tf2onnx, typically with:
    !python -m tf2onnx.convert
        –opset 11
        –fold_const
        –saved-model …/saved_models/model/
        –output …/saved_models/model-FP32/model.onnx
  1. Parse the ONNX model into C++

Thanks @rvorias I'm also doing something like this now. But my model is trained in NHWC format. I've tried using --inputs-as-nchw. The TRT engine can run. And i've organized the input image as a std::vector: BBBBBGGGGGGRRRRR(the code is at the bottom). But the thing is. the output is something not understandable, and the pixel values are unbelievable low (like 0.0004, it should have something like 0.9 at the target).

And i've also tried NHWC format, static or dynamic mode, and setting for dynamic shape is used const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH); INetworkDefinition* network = builder->createNetworkV2(explicitBatch); but the engine won't run, it'll show you a error like ERROR: StatefulPartitionedCall/model/conv2d/BiasAdd: kernel weights has count 1728 but 184320 was expected

load to cuda ` string ImgPath = "../000001.png"; Mat image = imread(ImgPath, -1); vector input(IN);

for (int i = 0; i < INPUT_H; i++)
{
    for (int j = 0; j < INPUT_W; j++)
    {
        Vec3b intensity = image.at<Vec3b>(i, j);
        cout << "i: " << i << " j: " << j << endl;
        input[j +  i      * INPUT_W] = float(intensity.val[0]); //BBB
        input[j + (i + 1) * INPUT_W] = float(intensity.val[1]); //GGG
        input[j + (i + 2) * INPUT_W] = float(intensity.val[2]); //RRR
    }

}`

unload from cuda (output is HW9, each channel is a gray scale image, in the following i've test just with 1 channel) ` Mat img_bin = Mat::zeros(INPUT_H, INPUT_W, CV_8UC1); //init empty img

for (int i = 0; i < INPUT_H; ++i)
{
    for (int j = 0; j < INPUT_W; ++j)
    {   
        int v = out_bin[j + i] * 255; //for opencv, let the probablity from 0-1 to 0-255
        img_bin.at<uchar>(i, j) = v;
    }

}
imshow("Display Window", img_bin);
waitKey(0);`