tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.51k stars 1.93k forks source link

Processing locally stored image for custom object detection in tensorflow/tfjs-react-native #5876

Closed brianspiesman closed 2 years ago

brianspiesman commented 3 years ago

I am building custom object detection on locally stored images into a React Native app using Tensorflow.js and would be grateful for help processing the image to a tensor before I call model.predict(). My code (shown below) to access a local image, process it to a tensor, and perform inference has worked nicely for a custom classification model (MobileNetV2). However, I am now getting a dtype error when I apply the same code to my custom object detection model (SSD MobileNet V2 FPNLite 640x640).

Error: The dtype of dict['input_tensor'] provided in model.execute(dict) must be int32, but was float32

useEffect(() => {
    (async () => {
      await tf.ready();
      const modelJson = require("../assets/VisModels/model_m.json")
      const modelWeight = require("../assets/VisModels/model_m_weights.bin")
      const model = await tf.loadLayersModel(bundleResourceIO(modelJson,modelWeight)) //for classification
      //const model = await tf.loadGraphModel(bundleResourceIO(modelJson,modelWeight)) //for object detection
      setModel(model)
    })();
}, []);
const fileUri = **uri of locally stored jpeg image**       
const imgB64 = await FileSystem.readAsStringAsync(fileUri, { encoding: FileSystem.EncodingType.Base64, });
const imgBuffer = tf.util.encodeString(imgB64, 'base64').buffer;
const imageData = new Uint8Array(imgBuffer);
const IMGSIZE = 640;
const imageTensor = decodeJpeg(imageData).expandDims().resizeBilinear([IMGSIZE,IMGSIZE]).div(tf.scalar(255)).reshape([1,IMGSIZE,IMGSIZE,3])
const prediction = await model.predict(imageTensor).data()

Changing the tensor to int32 produces this error:

Error: This execution contains the node 'StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Where', which has the dynamic op 'Where'. Please use model.executeAsync() instead. Alternatively, to avoid the dynamic ops, specify the inputs [StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Reshape]]

Using model.executeAsync() produces this error:

TypeError: model.executeAsync(imageTensor).data is not a function.

This is a managed expo (v43.02) workflow with react-native v0.64.3, @tensorflow/tfjs v3.11, and @tensorflow/tfjs-react-native v0.8.0. Any help in properly setting up the imageTensor would be greatly appreciated!

brianspiesman commented 2 years ago

Is there a more appropriate place to ask this question? Thanks!

jinjingforever commented 2 years ago

Hi @brianspiesman, sorry for the delay.. I have been busy with other projects.

executeAsync returns a Promise of Tensor (if input is a single tensor), so you probably need to do the following:

const result = await model.executeAsync(imageTensor);
const prediction = result.dataSync();

Please give it a try. Thanks!

brianspiesman commented 2 years ago

@jinjingforever thank you so much for your suggestion. However, I am back to the initial error:

Unhandled promise rejection: Error: The dtype of dict['input_tensor'] provided in model.execute(dict) must be int32, but was float32

I am processing the raw image for the object detection model in the same way as for my classification model. Should the images be processed differently depending on whether they will be used for object detection or classification? Thank you for any further suggestions.

jinjingforever commented 2 years ago

Good question. Is it possible to share your object detection model? I can take a closer look. Thanks!

brianspiesman commented 2 years ago

Sure, what is the best way to get it to you? The model is about 450kb and the weights are about 12mb.

jinjingforever commented 2 years ago

Maybe a github repo (along with your code would be great)? Thank you

brianspiesman commented 2 years ago

@jinjingforever: Ah yes, of course. Here is a link to a repository with code for a test version of the app. The model is in the assets/VisModels directory and the code for importing the OD model and running inference is in components/ImagePicker.js. Thank you for your patience as I am new to using github.

jinjingforever commented 2 years ago

Hi @brianspiesman,

I took a look at the model and it is indeed has int input. (I am using netron to visualize and inspect the model). I am not super familiar with the training and model conversion process, but our own coco-ssd model also has int input.

So to make your code work, you can simply add "toInt" to convert the input tensor to int type:

const imageTensor = decodeJpeg(imageData).expandDims().resizeBilinear([IMGSIZE,IMGSIZE])
    .div(tf.scalar(255)).reshape([1,IMGSIZE,IMGSIZE,3])
    .toInt(); // <----

Also, the return value of model.executeAsync in this case will be an array of tf.Tensor because the model has multiple output tensors. You will need to process each one individually. Something like:

const result = await model.executeAsync(imageTensor);
console.log(result[0].dataSync(), result[1].dataSync());

You can take a look at how our model processes the result here. Our model probably has different outputs from yours, due to some extra optimization steps.

Thank you! Let me know if you have questions.

brianspiesman commented 2 years ago

@jinjingforever I think this is working. Thank you! However, as you say, I receive an output of 8 tensors and I am not sure which ones to use. I would like extract the prediction value, the box coordinates, and the class (however, there is only 1 class in this OD model). Here is an example of the output:

Output array of 8 tensors tensor0 shape: [1, 100] tensor1 shape: [1, 100, 4] tensor2 shape: [1, 100] tensor3 shape: [1, 100, 2] tensor4 shape: [1, 100] tensor5 shape: [1] tensor6 shape: [1, 51150, 4] tensor7 shape: [1, 51150, 2]

In your link above, I see your model processing results in 2 tensors: [0] the scores and [1] the boxes. Can you tell which of the tensors in my output correspond to scores and boxes? Thank you for any further guidance.

jinjingforever commented 2 years ago

Hi @brianspiesman:

I think one way to do it is to look at the doc of the original model you converted this tfjs model from. For example, this TF object detection model. If you scroll to the bottom, you can see what different output tensors mean. If you don't know the original model, I think the model I linked might give you some ideas. Here is a full list of object detection models I found on tfhub.

In this case, I think tensor1 is probably the bounding box. One of the tensor0, tensor2, and tensor4 is probably the class index. Please give them a try.

Thanks!

brianspiesman commented 2 years ago

@jinjingforever this is perfect! Thank you for all your help on this. It is very much appreciated.

rthadur commented 2 years ago

Closing this , please @mention to reopen if issue still persists. Thank you