Closed msektrier closed 5 years ago
Do you need further input from my side or can I support?
@dsmilkov do you mind taking a look at this error?
Thanks! Will take a look today
@msektrier Did you edit the model.json
in any way? The problem stems from the fact that the mean
parameter in ops like FeatureExtractor/MobilenetV2/Conv/BatchNorm/FusedBatchNorm
has an invalid shape [0]
. The tensor with the invalid shape is FeatureExtractor/MobilenetV2/expanded_conv_15/expand/BatchNorm/Const
.
In python, if I try to provide a mean and variance with shape [0]
in tf.nn.batch_normalization
:
tf.nn.batch_normalization(
tf.zeros([1, 150, 150, 32], 'float32'), # x
tf.constant([], 'float32', [0]), # mean
tf.constant([], 'float32', [0]), # variance
offset=None,
scale=None,
variance_epsilon=1e-7
)
I get an error Incompatible shapes: [1,150,150,32] vs. [0] [Op:Mul] name: batchnorm/mul/
.
Can you share the saved model (.pb file) that you used to convert the model?
So I found another signal that points to the cause. Looking at the FusedBatchNorm ops in model.json
{
"input": [
"FeatureExtractor/MobilenetV2/Conv/Conv2D",
"FeatureExtractor/MobilenetV2/Conv/BatchNorm/gamma",
"FeatureExtractor/MobilenetV2/Conv/BatchNorm/beta",
"FeatureExtractor/MobilenetV2/expanded_conv_15/expand/BatchNorm/Const",
"FeatureExtractor/MobilenetV2/expanded_conv_15/expand/BatchNorm/Const"
],
"attr": {
"epsilon": {
"f": 0.0010000000474974513
},
"T": {
"type": "DT_FLOAT"
},
"is_training": {
"b": true
},
"data_format": {
"s": "TkhXQw=="
}
},
"name": "FeatureExtractor/MobilenetV2/Conv/BatchNorm/FusedBatchNorm",
"op": "FusedBatchNorm"
},
is_training
is set to True
, but it should be False
since you are trying to do inference.
Make sure you provide the correct serving tags when exporting the model using our converter tool (--signature_name=serving_default --saved_model_tags=serve
).
To see which signatures and tags exist , you can use the Saved Model CLI
Techniques like batchnorm and dropout require different graphs for inference vs training. See https://github.com/tensorflow/hub/issues/147#issuecomment-419929574 for more info.
Dear dsmilkov,
I just executed the whole transformation again. I uploaded the saved model, frozen model, and the converted json model to https://share.infinkon.de/index.php/s/WYnXjG85cj4AZnp (44MBytes). The error remains.
According to our ML engineer she used the Tensorflow object detection API and she isn't aware about changes that lead to such problems.
Honestly, I cannot judge since I just want to do the inference :). I have just tried for testing purposes to execute the frozen and saved model with Tensorflow 2.0.0-alpha with python and failed. Unfortunately, the documentation of the API is not telling me how I can load the graph, transfer it to a model and do the inference.
I do really appreciate your help and your efforts a lot and it helps me to lower my disappointments about bad, missing or not up-to-date documentations.
Regarding your hint with the Saved Model CLI. I executed "show" and "scan". signatures are not showing and the tags are default I would say with "serve".
Many thanks for your support.
Hello dsmilkov,
today, my engineer gave me a new converted model and this one runs....with further problems.
ssdlite_mobilenetV2_1class.zip - model.json only.
For the inference we have assured that the input tensor loaded in HTML is exactly the same like in our python environment. We load it simply by
let tensor = tf.browser.fromPixels(img);
expanded = tensor.expandDims(0);
return expanded
(in TFJS)
tf_image_string = tf.io.read_file(PATH_TO_IMAGE)
tf_image = tf.image.decode_jpeg(tf_image_string, channels=3, dct_method='INTEGER_ACCURATE')
tf_image_expanded1 = tf.expand_dims(tf_image, axis=0)
(in python)
...they are equal in size, shape and values.
Then we output the "detection_boxes" tensor and it gives us a different result in python and TFJS. Even more strange the first values are identical and then it goes crazy.
[[[0.89474267 0.16380781 0.94830686 0.20310044] [0. 0.7445791 0.17719875 1. ] [0.64993775 0. 0.84936297 0.3427423 ] [0.40547025 0.22818083 0.895257 0.46889073] [0.6517975 0.5531899 0.8523899 1. ] [0.6154277 0.8415282 1. 1. ] [0.46409145 0. 1. 0.15208271] [0.83050364 0.6987988 1. 1. ] [0.14641336 0.02349633 0.3389939 0.6332496 ] [0.5142635 0.728901 0.9893209 0.9674072 ] [0.54869866 0.14797944 0.7521107 0.75144666] [0.4098649 0.13039494 0.8977258 0.37028956] [0. 0.357253 0.14110368 0.9779241 ] [0.65561855 0.648258 1. 0.84357405] [0.840054 0. 1. 0.31862792] [0.68168664 0.7677454 1. 1. ] [0.44844124 0.15256423 0.65252775 0.7499266 ] [0.62348706 0.50179255 0.875133 0.99561524] [0.74397695 0.6874038 0.94830465 1. ] [0.68101573 0.24813068 1. 0.4446543 ] [0.21217245 0.72950757 0.69929665 0.97...
: (4) [0.8947426080703735, 0.16380780935287476, 0.9483069181442261, 0.20310050249099731] 1: (4) [0.5229350328445435, 0.5015679001808167, 1, 1] 2: (4) [0.27248457074165344, 0.10805338621139526, 0.7271230220794678, 0.9670282006263733] 3: (4) [0.6002633571624756, 0.38063299655914307, 1, 1] 4: (4) [0, 0.31815987825393677, 1, 0.691149890422821] 5: (4) [0.3790368139743805, 0.6115804314613342, 1, 1] 6: (4) [0.12581798434257507, 0.1382976472377777, 0.8695082664489746, 0.8439497947692871] 7: (4) [0.3165704607963562, 0, 0.6834971308708191, 1] 8: (4) [0, 0.5953051447868347, 0.6202481389045715, 1] 9: (4) [0, 0.3122354745864868, 0.3565589487552643, 1] 10: (4) [0, 0.7331592440605164, 0.2514439523220062, 1] 11: (4) [0.18738055229187012, 0.5012880563735962, 0.8259190320968628, 1] 12: (4) [0, 0.845550537109375, 0.29175451397895813, 1] 13: (4) [0, 0.5361356735229492, 0.49668145179748535, 1] 14: (4) [0.05381292104721069, 0.6056756973266602, 1, 1] 15: (4) [0, 0.7074476480484009, 0.36408132314682007, 1]
As you can see the first 4 values are identical "0.89474267 0.16380781 0.94830686 0.20310044" but then they differ completely.
Do you have any indication for us with your experience?
many thanks
@msektrier , I am also facing the same issue. Have you resolved this problem? i want to know why this is happening?
@sandhyacs We have no idea and we do not investigate further. We hope some problems are fixed with the next release of TFJS. We assume these are TF/TFJS internals.
With the yesterday's (17/06/2019) version of TFJS we receive a new error in the browser for the posted network.
However, our other ssdlite mobilenet V2 network is finally running. It is the first time we could successfully transfer this network from python to javascript. Many thanks for that!
Thanks for filing a separate issue regarding the numerical precision which we will track at https://github.com/tensorflow/tfjs/issues/1652 (see a workaround in that issue).
In this current issue, I want to help figure out the runtime error you got in the latest version. Can you share with me the latest model.json, the weights, and how you call model.executeAsync()? Thanks!
Dear dsmilkov,
I just tried to reproduce it with the given model and I cannot reproduce the error "cannot read property 'op' of undefined".
Thanks, will close the issue.
@msektrier how did you get past your original problem of the js model raising the error Uncaught (in promise) Error: Operands could not be broadcast together with shapes 1,150,150,32 and 0.
?
@mattstruble Can you send us the model/steps so we can reproduce?
@dsmilkov thanks for the quick reply and offering to look into this!
Here is a link to the saved_model, frozen_inference_graph, and web_model.
My model was generated using the tensorflow/models/research/object_detection/model_main.py
script in TF 1.13.1 using the following command:
python model_main.py --model_dir=output --pipeline_config_path=training\ssd_mobilenet_v2_coco.config -num_train_steps=200000
I then exported the inference graph.
python export_inference_graph.py --input_type=image_tensor --output_directory=output_inf --pipeline_config_path=training\ssd_mobilenet_v2_coco.config --trained_checkpoint_prefix=neg_32\model.ckpt-XXXX
Then converted it to tensorflowjs with tfjs version 1.2.2.1 (I still get the same error using 0.8.6).
tensorflowjs_converter --input_format=tf_saved_model --output_format=tfjs_graph_model --saved_model_tags=serve --signature_name=serving_default output_inf\saved_model output_inf\web_model
Then when I try running it in the web I get the broadcast shapes error.
import * as tf from '@tensorflow/tfjs';
class Detector {
async init() {
try {
this.model = await tf.loadGraphModel('/web_model/model.json');
} catch (err) {
console.log(err);
}
}
async detect(frame) {
const { model } = this;
const INPUT_TENSOR='image_tensor';
const OUTPUT_TENSOR='num_detections'
const zeros = tf.zeros([1, 300, 300, 3]);
console.log("executing model");
output = await model.executeAsync({[INPUT_TENSOR]: zeros}, OUTPUT_TENSOR);
console.log(output);
}
}
export default Detector;
It seems to be the same issue @msektrier was having with the BatchNorm layers is_training=True
. Which is why I originally asked him how he got around it, since I can't seem to find documentation on how to save the model with is_training=False
.
Thanks for your support!
@dsmilkov not to be a bother, but I was wondering if you had any status on this issue?
Many thanks.
@msektrier @dsmilkov, What was the solution to this problem or is there any known workaround? I'm having the same problem.
To get help from the community, we encourage using Stack Overflow and the
tensorflow.js
tag.TensorFlow.js version
tjfs version : 1.0.1 but happens also with 0.8.5
Browser version
Chrome Version 74.0.3729.131 (Offizieller Build) (64-Bit)
Describe the problem or feature request
We simplified the standard ssdlite_mobilenet_v2 model from tensorflow zoo repository. Tfjs converter transformation is successful using Tfjs converter 1.0.1 and 0.8.5. Moreover, in Python we can do inferences with this model.
ssdlite_v2_1class (tfjs converter 085 not working).zip
custom_ssdlite_mobilenetv2_1class.zip
When we try to apply it in JavaScript we an error as stated below.
Code to reproduce the bug / link to feature request
Loading it in Javascript with for the model from converter version 1.0.1:
will result in chrome and firefox in