tensorflow / java

Java bindings for TensorFlow
Apache License 2.0
818 stars 202 forks source link

Unable to execute tfhub model: getting TFInvalidArgumentException #85

Closed samikrc closed 4 years ago

samikrc commented 4 years ago

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the current behavior

I am trying to invoke a TFHUB model (Universal Sentence Encoder v4) using the new java API (using Scala). However, I am getting stuck at the error below.

An exception or error caused a run to abort: Malformed TF_STRING tensor; too short to hold number of elements 
org.tensorflow.exceptions.TFInvalidArgumentException: Malformed TF_STRING tensor; too short to hold number of elements
    at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87)
    at org.tensorflow.Session.run(Session.java:595)
    at org.tensorflow.Session.access$100(Session.java:70)
    at org.tensorflow.Session$Runner.runHelper(Session.java:335)
    at org.tensorflow.Session$Runner.run(Session.java:285)
    at org.samik.EmbeddingModelServer.USEEmbeddingServerTest.<init>(USEEmbeddingServerTest.scala:85)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

Describe the expected behavior

The code should compile and execute.

Code to reproduce the issue

import org.tensorflow.proto.framework.{MetaGraphDef, SignatureDef, TensorInfo}
import org.tensorflow.{SavedModelBundle, Tensor}

import scala.collection.JavaConverters._
import scala.collection.mutable

import org.tensorflow.ndarray.Shape
import org.tensorflow.ndarray.buffer.DataBuffers
import org.tensorflow.types.{TString, TUint8}
import java.nio.ByteBuffer
import java.nio.charset.StandardCharsets

class TestApp extends App
{
    val useModel = SavedModelBundle.load("/local/path/to/tfhub/use_4", "serve")

    val metaData = useModel.metaGraphDef()
    val signatureDef = metaData.getSignatureDefMap().get("serving_default")
    val firstInput = getInputToShape(metaData).keys.head
    val firstOutput = getOutputToShape(metaData).keys.head

    val input = "Hello"
    val dataBuffer = DataBuffers.of(ByteBuffer.wrap(input.getBytes(StandardCharsets.UTF_8)))
    val tensor = Tensor.of(TString.DTYPE, Shape.of(1L), dataBuffer)
    println(s"Tensor: $tensor")
    val sessionRunner = useModel.session().runner()
    val result = sessionRunner
            .feed(firstInput, tensor)
                        //****** The below line (fetch(..)) seems to be generating the error *********//
            .fetch(firstOutput)
            .run()
            .asScala
    println(result)

    private def getOutputToShape(metadata: MetaGraphDef): mutable.Map[String, Shape] =
        mapToShape(signatureDef.getOutputsMap.asScala)

    private def getInputToShape(metadata: MetaGraphDef): mutable.Map[String, Shape] =
        mapToShape(signatureDef.getInputsMap.asScala)

    private def mapToShape(map: mutable.Map[String, TensorInfo]): mutable.Map[String, Shape] =
    {
        map.foldLeft(mutable.HashMap[String, Shape]())
        { case(accum, (_, tensorInfo)) =>
            val dimList = tensorInfo.getTensorShape.getDimList.asScala.map(_.getSize)
            val shape = if(dimList.length == 0) Shape.unknown() else Shape.of(dimList: _*)
            accum += (tensorInfo.getName -> shape)
        }
    }
}

However, pretty much the same code, with the same helper functions work with the published jar (1.15.0). Here is the corresponding snippet.

    val metaData = MetaGraphDef.parseFrom(useModel.metaGraphDef())
    val firstInput = getInputToShape(metaData).keys.head
    val firstOutput = getOutputToShape(metaData).keys.head

    val input = "Hello there!"
    val inputTensor: Tensor[String] = Tensors.create(Array(input.getBytes()))

    val sessionRunner = useModel.session().runner()
    val results = sessionRunner.feed(firstInput, inputTensor).fetch(firstOutput).run().asScala
    results.foreach(tensor => {
        val array = Array.ofDim[Float](tensor.shape()(0).toInt, tensor.shape()(1).toInt)
        tensor.copyTo(array)
        println(s"[${array(0).mkString(", ")}]")
    })

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

karllessard commented 4 years ago

Thanks for reporting this @samikrc ,

Quickly like that in the TF1.x version, you are processing your input as a single scalar string while in TF2.x, you initialize your shape with one dimension of one element. Can you retry be replacing Shape.of(1L) by Shape.scalar()? Or better, you can simply initialize the whole input tensor like this:

val tensor = TString.scalarOf("Hello")
samikrc commented 4 years ago

@karllessard Thanks for the response. Not much luck still.

Here is the code:

    val metaData = useModel.metaGraphDef()
    val signatureDef = metaData.getSignatureDefMap().get("serving_default")
    val firstInput = mapToShape(signatureDef.getInputsMap.asScala).keys.head
    val firstOutput = mapToShape(signatureDef.getOutputsMap.asScala).keys.head
    val sessionRunner = useModel.session().runner()

    println(s"firstInput: $firstInput, firstOutput: $firstOutput")
    val inputs = "Hello"
    val inputTensors = TString.scalarOf(inputs)

    val results = sessionRunner
            .feed(firstInput, inputTensors)
            .fetch(firstOutput)
            .run()
            .asScala
            .head

Here is the exception from the fetch(..) call above:

firstInput: serving_default_inputs:0, firstOutput: StatefulPartitionedCall_1:0

An exception or error caused a run to abort: [_Derived_]{{function_node __inference_pruned_2009}} {{function_node __inference_pruned_2009}} input must be a vector, got shape: []
     [[{{node text_preprocessor/tokenize/StringSplit/StringSplit}}]]
     [[StatefulPartitionedCall_1/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall]] 
org.tensorflow.exceptions.TFInvalidArgumentException: [_Derived_]{{function_node __inference_pruned_2009}} {{function_node __inference_pruned_2009}} input must be a vector, got shape: []
     [[{{node text_preprocessor/tokenize/StringSplit/StringSplit}}]]
     [[StatefulPartitionedCall_1/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall]]
    at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87)
    at org.tensorflow.Session.run(Session.java:595)
    at org.tensorflow.Session.access$100(Session.java:70)
    at org.tensorflow.Session$Runner.runHelper(Session.java:335)
    at org.tensorflow.Session$Runner.run(Session.java:285)
    at org.samik.EmbeddingModelServer.USEEmbeddingServerTest.<init>(USEEmbeddingServerTest.scala:65)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

The code is available at https://github.com/samikrc/embedding-model-server, branch: tf2.2, if that helps (although the model itself is not uploaded in the repo).

karllessard commented 4 years ago

Oops, sorry I have misread your first example then, you do want a vector and not a scalar. Change back TString.scalarOf(inputs) by TString.vectorOf(inputs) and it should get rid of that error. But the initial problem might come back as well, I’ll take a look when I have some time,

Karl

On Jul 18, 2020, at 04:12, Samik R notifications@github.com wrote:

 @karllessard Thanks for the response. Not much luck still.

Here is the code:

val metaData = useModel.metaGraphDef()
val signatureDef = metaData.getSignatureDefMap().get("serving_default")
val firstInput = mapToShape(signatureDef.getInputsMap.asScala).keys.head
val firstOutput = mapToShape(signatureDef.getOutputsMap.asScala).keys.head
val sessionRunner = useModel.session().runner()

println(s"firstInput: $firstInput, firstOutput: $firstOutput")
val inputs = "Hello"
val inputTensors = TString.scalarOf(inputs)

val results = sessionRunner
        .feed(firstInput, inputTensors)
        .fetch(firstOutput)
        .run()
        .asScala
        .head

Here is the exception from the fetch(..) call above:

firstInput: serving_default_inputs:0, firstOutput: StatefulPartitionedCall_1:0

An exception or error caused a run to abort: [Derived]{{function_node inference_pruned_2009}} {{function_node __inference_pruned_2009}} input must be a vector, got shape: [] [[{{node text_preprocessor/tokenize/StringSplit/StringSplit}}]] [[StatefulPartitionedCall_1/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall]] org.tensorflow.exceptions.TFInvalidArgumentException: [Derived]{{function_node inference_pruned_2009}} {{function_node __inference_pruned_2009}} input must be a vector, got shape: [] [[{{node text_preprocessor/tokenize/StringSplit/StringSplit}}]] [[StatefulPartitionedCall_1/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall]] at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87) at org.tensorflow.Session.run(Session.java:595) at org.tensorflow.Session.access$100(Session.java:70) at org.tensorflow.Session$Runner.runHelper(Session.java:335) at org.tensorflow.Session$Runner.run(Session.java:285) at org.samik.EmbeddingModelServer.USEEmbeddingServerTest.(USEEmbeddingServerTest.scala:65) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

samikrc commented 4 years ago

Hello, that seem to work - not getting any exception - thanks. Still need some help in the last part - to copy over the tensor to a 2D float array. The old copyTo method doesn't seem to exist any more.

I tried playing with results.rawData().asFloats().read(..) but that method (read(..)) seems to allow only 1D float array, whereas I would like to have the data in a 2D float array.

Here is what I have so far:

    val inputs = "Hello"
    val inputTensors = TString.vectorOf(Array(inputs): _*)
    val results = sessionRunner
            .feed(firstInput, inputTensors)
            .fetch(firstOutput)
            .run()
            .asScala
            .head
    // *** The below used to work in 1.15.0. What is the equivalent method? ***
    val resultShape = results.shape().asArray()
    val array = Array.ofDim[Float](resultShape(0).toInt, resultShape(1).toInt)
    //results.copyTo(array)
karllessard commented 4 years ago

Still need some help in the last part - to copy over the tensor to a 2D float array.

You have a good point here: there is no way right now to copy content of an NdArray to a multi-dimensional Java array. The general idea here is that NdArray becomes the new structure to play with multi-dimensional data, which aims to be more efficient than these segmented Java arrays.

While it could be something lacking (note that this library is still in progress), you might also want to reconsider your need to copy the tensor data to an array. In version 1.x, it was mandatory to copy the tensor memory on the heap before reading it. But now, you can directly access the data of your native tensor by invoking tensor.data(). If the data needs to live beyond the scope of the tensor, you can still copy it on the heap via another NdArray instance, by doing something like:

FloatNdArray copy = NdArrays.ofFloats(tensor.shape());
tensor.data().copyTo(copy);

Still, I think the ability of copying a NdArray to a standard Java array is something we might want to look at (we only support the other way around in StdArrays). Thanks for pointing that out.

samikrc commented 4 years ago

Hello - thanks, yes, the use case I am trying to solve involves serving embeddings as JSON arrays to multiple other downstream models, some of which may not even be TF models. So I definitely need this functionality.

From your explanation, it seems like there isn't an workaround at the moment either - am I right? In that case, I will stick to the 1.15.0 API for the moment.

karllessard commented 4 years ago

Hi @samikrc ,

I understand your case and I think that was something lacking in the current implementation of the ndarray library, I pushed changes ready for review that would add this functionality, please take a look and let me know if that would have fulfilled your requirements, thanks: https://github.com/tensorflow/java/pull/86

samikrc commented 4 years ago

Hi, added my comments in #86, although seems like I am late to the party :-)

karllessard commented 4 years ago

@samikrc, I'm closing this topic, please reopen if you don't think the fix is suitable for your use case, thanks!

samikrc commented 4 years ago

I think the solution works - thanks.