Retinanet - how to perform inference

rbgreenway commented 6 years ago

I've been using TensorflowSharp with Faster RCNN successfully for a while now; however, I recently trained a Retinanet model (using Keras/Python3.5), verified it works in python, and have created a frozen pb file for use with Tensorflow. For FRCNN, there is an example in the TensorflowSharp GitHub repo that shows how to run/fetch this model. For Retinanet, I tried modifying the code but nothing seems to work. I have a model summary for Retinanet that I've tried to work from, but it's not obvious to me what should be used. The problem appears to be the parameters for the "Fetch" portion of the Runner.

For FRCNN, the graph is run in this way:

     var runner = m_session.GetRunner();

        runner
            .AddInput(m_graph["image_tensor"][0], tensor)
            .Fetch(
            m_graph["detection_boxes"][0],
            m_graph["detection_scores"][0],
            m_graph["detection_classes"][0],
            m_graph["num_detections"][0]);

           var output = runner.Run();

            var boxes = (float[,,])output[0].GetValue(jagged: false);
            var scores = (float[,])output[1].GetValue(jagged: false);
            var classes = (float[,])output[2].GetValue(jagged: false);
            var num = (float[])output[3].GetValue(jagged: false);

From the model summary for FRCNN, it is obvious what the input ("image_tensor") and outputs ("detection_boxes", "detection_scores", "detection_classes", and "num_detections") are. They are not the same for Retinanet (I've tried), and I can't figure out what they should be. The "Fetch" part of the code above is causing a crash, and I'm guessing its because I'm not getting the node names right.

I won't paste the entire Retinanet summary here, but here is the first few nodes:

    Layer (type)                    Output Shape         Param #     Connected to                     
    ==================================================================================================
    input_1 (InputLayer)            (None, None, None, 3 0                                            
    __________________________________________________________________________________________________
    padding_conv1 (ZeroPadding2D)   (None, None, None, 3 0           input_1[0][0]                    
    __________________________________________________________________________________________________
    conv1 (Conv2D)                  (None, None, None, 6 9408        padding_conv1[0][0]              
    __________________________________________________________________________________________________
    bn_conv1 (BatchNormalization)   (None, None, None, 6 256         conv1[0][0]                      
    __________________________________________________________________________________________________
    conv1_relu (Activation)         (None, None, None, 6 0           bn_conv1[0][0]                   
    __________________________________________________________________________________________________

And here are the last several nodes:

   __________________________________________________________________________________________________
    anchors_0 (Anchors)             (None, None, 4)      0           P3[0][0]                         
    __________________________________________________________________________________________________
    anchors_1 (Anchors)             (None, None, 4)      0           P4[0][0]                         
    __________________________________________________________________________________________________
    anchors_2 (Anchors)             (None, None, 4)      0           P5[0][0]                         
    __________________________________________________________________________________________________
    anchors_3 (Anchors)             (None, None, 4)      0           P6[0][0]                         
    __________________________________________________________________________________________________
    anchors_4 (Anchors)             (None, None, 4)      0           P7[0][0]                         
    __________________________________________________________________________________________________
    regression_submodel (Model)     (None, None, 4)      2443300     P3[0][0]                         
                                                                     P4[0][0]                         
                                                                     P5[0][0]                         
                                                                     P6[0][0]                         
                                                                     P7[0][0]                         
    __________________________________________________________________________________________________
    anchors (Concatenate)           (None, None, 4)      0           anchors_0[0][0]                  
                                                                     anchors_1[0][0]                  
                                                                     anchors_2[0][0]                  
                                                                     anchors_3[0][0]                  
                                                                     anchors_4[0][0]                  
    __________________________________________________________________________________________________
    regression (Concatenate)        (None, None, 4)      0           regression_submodel[1][0]        
                                                                     regression_submodel[2][0]        
                                                                     regression_submodel[3][0]        
                                                                     regression_submodel[4][0]        
                                                                     regression_submodel[5][0]        
    __________________________________________________________________________________________________
    boxes (RegressBoxes)            (None, None, 4)      0           anchors[0][0]                    
                                                                     regression[0][0]                 
    __________________________________________________________________________________________________
    classification_submodel (Model) (None, None, 1)      2381065     P3[0][0]                         
                                                                     P4[0][0]                         
                                                                     P5[0][0]                         
                                                                     P6[0][0]                         
                                                                     P7[0][0]                         
    __________________________________________________________________________________________________
    clipped_boxes (ClipBoxes)       (None, None, 4)      0           input_1[0][0]                    
                                                                     boxes[0][0]                      
    __________________________________________________________________________________________________
    classification (Concatenate)    (None, None, 1)      0           classification_submodel[1][0]    
                                                                     classification_submodel[2][0]    
                                                                     classification_submodel[3][0]    
                                                                     classification_submodel[4][0]    
                                                                     classification_submodel[5][0]    
    __________________________________________________________________________________________________
    filtered_detections (FilterDete [(None, 300, 4), (No 0           clipped_boxes[0][0]              
                                                                     classification[0][0]             
    ==================================================================================================
    Total params: 36,382,957
    Trainable params: 36,276,717
    Non-trainable params: 106,240

Any help with figure out how to fix the "Fetch" part of this would be greatly appreciated.

EDIT:

To dig a little further into this, I found a python function to print the operation names from a .pb file. When doing this for the FRCNN .pb file, it clearly gave the output node names, as can be seen below (only posting the last several lines from the output of the python function).

    import/SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/TensorArrayGatherV3
    import/SecondStagePostprocessor/ToFloat_1
    import/add/y
    import/add
    import/detection_boxes
    import/detection_scores
    import/detection_classes
    import/num_detections

If I do the same thing for the Retinanet .pb file, it's not obvious what the outputs are. Here's the last several lines from the python function.

   import/filtered_detections/map/while/NextIteration_4
    import/filtered_detections/map/while/Exit_2
    import/filtered_detections/map/while/Exit_3
    import/filtered_detections/map/while/Exit_4
    import/filtered_detections/map/TensorArrayStack/TensorArraySizeV3
    import/filtered_detections/map/TensorArrayStack/range/start
    import/filtered_detections/map/TensorArrayStack/range/delta
    import/filtered_detections/map/TensorArrayStack/range
    import/filtered_detections/map/TensorArrayStack/TensorArrayGatherV3
    import/filtered_detections/map/TensorArrayStack_1/TensorArraySizeV3
    import/filtered_detections/map/TensorArrayStack_1/range/start
    import/filtered_detections/map/TensorArrayStack_1/range/delta
    import/filtered_detections/map/TensorArrayStack_1/range
    import/filtered_detections/map/TensorArrayStack_1/TensorArrayGatherV3
    import/filtered_detections/map/TensorArrayStack_2/TensorArraySizeV3
    import/filtered_detections/map/TensorArrayStack_2/range/start
    import/filtered_detections/map/TensorArrayStack_2/range/delta
    import/filtered_detections/map/TensorArrayStack_2/range
    import/filtered_detections/map/TensorArrayStack_2/TensorArrayGatherV3

For reference, here's the python function that I used:

    def printTensors(pb_file):

        # read pb into graph_def
        with tf.gfile.GFile(pb_file, "rb") as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())

        # import graph_def
        with tf.Graph().as_default() as graph:
            tf.import_graph_def(graph_def)

        # print operations
        for op in graph.get_operations():
            print(op.name)

Hope this helps.

If I can get this working, I'll gladly share my code for training Retinanet in Keras (which is actually transfer learning on my custom objects) and for running the inference of that model in TensorflowSharp. In my python testing, Retinanet clearly outperforms FRCNN.

oferbentovim commented 5 years ago

I have the same problem does anyone knows which operations get the masks and classification results for mrcnn?

rbgreenway commented 5 years ago

I was able to figure out the in/out layer names for the Keras Retinanet implementation using a tool that comes with the Tensorflow source called summarize_graph. This may be too much information, but this is basically the process: 1) get the Tensorflow source from https://github.com/tensorflow/tensorflow 2) install bazel for your system (this is the build tool needed to build summarize_graph). You may need to find instructions for installing bazel for your distro. 3) Navigate to the root of the tensorflow source directory, and then run in a terminal

./configure
bazel build tensorflow/tools/graph_transforms:summarize_graph

4) summarize_graph is located at:

<path/to/TensorflowSource>/tensorflow/bazel-bin/tensorflow/tools/graph_transforms

example run in terminal: /home/bryan/TFSource/tensorflow/bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph="/home/bryan/retinanet/keras-retinanet/snapshots/test_bryan.pb"

If your .pb file is ready for inference and frozen, then the output will tell you then names of the input and output layers. Here is the relevant part of the output from the command above for a Keras-Retinanet trained network:

Found 1 possible inputs: (name=input_1, type=float(1), shape=[?,?,?,3]) 
No variables spotted.
Found 3 possible outputs: (name=filtered_detections/map/TensorArrayStack/TensorArrayGatherV3, op=TensorArrayGatherV3) (name=filtered_detections/map/TensorArrayStack_1/TensorArrayGatherV3, op=TensorArrayGatherV3) (name=filtered_detections/map/TensorArrayStack_2/TensorArrayGatherV3, op=TensorArrayGatherV3)

From this, you can see that Input Layer name is "input_1"

Output Layer names are "filtered_detections/map/TensorArrayStack/TensorArrayGatherV3" <-- this is the boxes "filtered_detections/map/TensorArrayStack_1/TensorArrayGatherV3" <-- this is the scores "filtered_detections/map/TensorArrayStack_2/TensorArrayGatherV3" <-- this is the classes

You can see the output layer names are quite complicated (I could have never guessed them).

I know you're working on MRCNN and not RetinaNet, but I hope this helps.

For those interested (judging by the lack of response to my original post, this may be no one), I'll try to put together a complete post of the process for taking the trained .h5 Keras file, converting it to a .pb file, and then using this .pb file with TensorflowSharp. There are lots of little nuances that I had to figure out in order to get it to work properly, but it was worth the effort for me.

chrigui94 commented 4 years ago

Hello @rbgreenway , I have the same problem implementing MaskRcnn frozen graph using TensorFlowSharp, can you please post your approach to implement this?

migueldeicaza / TensorFlowSharp

Retinanet - how to perform inference #351