tensorflow / models

Models and examples built with TensorFlow
Other
76.98k stars 45.79k forks source link

issue with running forward passes using saved model format converted from a checkpoint #10204

Open atahmasb opened 3 years ago

atahmasb commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/official/...

2. Describe the bug

I fine tuned faster_rcnn_resnet50_keras model from the TF object detection API on my own dataset and I used a script (exporter_main_v2) under models/research/object_detection/ to convert one of my checkpoints to a saved model format to serve in a Golang application. I can load the saved model files in Golang using TF Golang client but when I do a forward pass I get the following error:

2021-07-30 15:45:06.876593: I tensorflow/cc/saved_model/loader.cc:303] SavedModel load for tags { serve }; Status: success: OK. Took 1198184 microseconds.
{"severity":"info","timestamp":"2021-07-30T15:45:07.743500777Z","caller":"/go/src/bitbucket.org/ehsai/doc-structure/internal/classifier/object_detection_init.go:30","message":"Loading Tensorflow Model labels: /go/src/bitbucket.org/ehsai/doc-structure/models/classifiers/document_structure/object_detection/v14"}
2021-07-30 15:45:21.226358: E tensorflow/core/framework/tensor.cc:555] Could not decode variant with type_name: "tensorflow::TensorList".  Perhaps you forgot to register a decoder via REGISTER_UNARY_VARIANT_DECODE_FUNCTION?
2021-07-30 15:45:21.226403: W tensorflow/core/framework/op_kernel.cc:1744] OP_REQUIRES failed at constant_op.cc:82 : Invalid argument: Cannot parse tensor from tensor_proto.
2021-07-30 15:45:21.255391: E tensorflow/core/framework/tensor.cc:555] Could not decode variant with type_name: "tensorflow::TensorList".  Perhaps you forgot to register a decoder via REGISTER_UNARY_VARIANT_DECODE_FUNCTION?
2021-07-30 15:45:21.255454: W tensorflow/core/framework/op_kernel.cc:1744] OP_REQUIRES failed at constant_op.cc:82 : Invalid argument: Cannot parse tensor from proto: dtype: DT_VARIANT
tensor_shape {
}
variant_val {
  type_name: "tensorflow::TensorList"
  metadata: "\001\000\001\377\377\377\377\377\377\377\377\377\001\030\001"
}

{"severity":"error","timestamp":"2021-07-30T15:45:21.265370446Z","caller":"/go/src/bitbucket.org/ehsai/doc-structure/internal/classifier/object_detection_run.go:135","message":"An error occurred during forwad pass, err=Cannot parse tensor from proto: dtype: DT_VARIANT\ntensor_shape {\n}\nvariant_val {\n  type_name: \"tensorflow::TensorList\"\n  metadata: \"\\001\\000\\001\\377\\377\\377\\377\\377\\377\\377\\377\\377\\001\\030\\001\"\n}\n\n\t [[{{node StatefulPartitionedCall/StatefulPartitionedCall/map/TensorArrayV2_1/_0__cf__4}}]]"}
    suite.go:63: test panicked: runtime error: index out of range [2] with length 0
        goroutine 158 [running]:
        runtime/debug.Stack(0xc001f95710, 0x9f4bc0, 0xc0000fa000)
                /usr/local/go/src/runtime/debug/stack.go:24 +0x9f
        github.com/stretchr/testify/suite.failOnPanic(0xc000d03e00)
                /go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:63 +0x57
        panic(0x9f4bc0, 0xc0000fa000)
                /usr/local/go/src/runtime/panic.go:969 +0x175
        bitbucket.org/ehsai/doc-structure/internal/classifier.(*objectDetectionModelSuite).Test_modelOutputsShape(0xc0043c60a0)
                /go/src/bitbucket.org/ehsai/doc-structure/internal/classifier/object_detection_test.go:146 +0xb45
        reflect.Value.call(0xc00440b620, 0xc004408550, 0x13, 0xa376d8, 0x4, 0xc000325e30, 0x1, 0x1, 0xc000325cf8, 0x41142a, ...)
                /usr/local/go/src/reflect/value.go:475 +0x8c7
        reflect.Value.Call(0xc00440b620, 0xc004408550, 0x13, 0xc000325e30, 0x1, 0x1, 0x24, 0xcf345, 0x519dc4)
                /usr/local/go/src/reflect/value.go:336 +0xb9
        github.com/stretchr/testify/suite.Run.func1(0xc000d03e00)
                /go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:158 +0x379
        testing.tRunner(0xc000d03e00, 0xc004155cb0)
                /usr/local/go/src/testing/testing.go:1127 +0xef
        created by testing.(*T).Run

When I tried to load the saved model files in Python using tf.saved_model.load() I had no problem, I could load the model and run a forward pass and I got the exact same predictions using the checkpoint and the saved model filed so this happens only when I load the model in Golang.

What's wired is that when I loaded the saved model files provided in the object detection model zoo (Pre trained model done by internal people in TF team I guess) I was able to run forward passes in Golang so I think there must be something wrong with the conversion script.

3. Steps to reproduce

step 1: You need to download a pre trained model from the object detection model zoo (any model) https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md and then you need to use the checkpoint and convert it to a saved model format using the following code:

import tensorflow as tf
from PIL import Image, ImageDraw, ImageFont
from six import BytesIO
import numpy as np
from object_detection.utils import label_map_util
from object_detection.utils import config_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder
from object_detection.exporter_lib_v2 import export_inference_graph
import os
from object_detection.protos import pipeline_pb2
from google.protobuf import text_format

pipeline_config_path = path to the pipeline config file available in the downlaoded model folder
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
with tf.io.gfile.GFile(pipeline_config_path, 'r') as f:
    text_format.Merge(f.read(), pipeline_config)
model_dir =  path to the checkpoint in the downlaoded model
saved_model_path = a path to save exported saved model files

export_inference_graph("image_tensor", pipeline_config, model_dir,  saved_model_path)

Running the above code generates a folder called saved_model in the saved_model_path directory. You need to load the model in Golang and run a forward pass. Here is the code to do that.

step 2:

package main

import (
    "bytes"
    "fmt"
    "image"
    "image/png"
    "os"

    tf "github.com/tensorflow/tensorflow/tensorflow/go"
    tf_op "github.com/tensorflow/tensorflow/tensorflow/go/op"
)

func main() {

    tags := []string{"serve"}

    imagePath := "Path to a png image to feed to the model"

    modelDir := "Path to a directory that has saved model files"
    tensorflowModel, err := tf.LoadSavedModel(modelDir, tags, nil)
    if err != nil {
        os.Exit(1)
    }

    imageBytes, err := generateByteArrayFromPngFile(imagePath)
    if err != nil {
        os.Exit(1)
    }

    tensor, err := tf.NewTensor(string(imageBytes))
    if err != nil {
        os.Exit(1)
    }

    // Prepare image for forward pass
    scope := tf_op.NewScope()
    // @ts-ignore
    input := tf_op.Placeholder(scope, tf.String)
    out := tf_op.ExpandDims(scope,
        tf_op.DecodePng(scope, input, tf_op.DecodePngChannels(3)),
        tf_op.Const(scope.SubScope("make_batch"), int32(0)))

    outs, err := runScope(scope, map[tf.Output]*tf.Tensor{input: tensor}, []tf.Output{out})
    if err != nil {
        os.Exit(1)
    }

    modelOutputs, err := tensorflowModel.Session.Run(
        map[tf.Output]*tf.Tensor{
            tensorflowModel.Graph.Operation("serving_default_input_tensor").Output(0): outs[0],
        },
        []tf.Output{
            // scores
            tensorflowModel.Graph.Operation("StatefulPartitionedCall").Output(4),
            // classes
            tensorflowModel.Graph.Operation("StatefulPartitionedCall").Output(2),
            // bounding boxes
            tensorflowModel.Graph.Operation("StatefulPartitionedCall").Output(1),
        },
        nil)

    if err != nil {
        os.Exit(1)
    }

    fmt.Println(modelOutputs)

}

func runScope(s *tf_op.Scope, inputs map[tf.Output]*tf.Tensor, outputs []tf.Output) ([]*tf.Tensor, error) {
    graph, err := s.Finalize()
    if err != nil {
        return nil, err
    }

    session, err := tf.NewSession(graph, nil)
    if err != nil {
        return nil, err
    }
    defer session.Close()
    return session.Run(inputs, outputs, nil)
}

func generateByteArrayFromPngFile(filePath string) ([]byte, error) {
    existingImageFile, err := os.Open(filePath)
    if err != nil {
        return nil, err
    }
    defer existingImageFile.Close()

    imageData, imageType, err := image.Decode(existingImageFile)
    if err != nil {
        return nil, err
    }
    if imageType != "png" {
        return nil, err
    }

    pngBytesBuffer := new(bytes.Buffer)
    png.Encode(pngBytesBuffer, imageData)

    return pngBytesBuffer.Bytes(), nil
}

4. Expected behavior

I expect to load the saved model files and perform forward passes, I have done this before using TF object detection V1 so this issue happened when I started using TF objection deection V2 to fine a pre trained model on my dataset.

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

atahmasb commented 3 years ago

Has anyone had a chance to look at this bug report? any solutions?

HoopsMcann commented 2 years ago

Any status update here? I am having a similar issue..