unable to load resnet50 bodypix model

PushyamiKaveti commented 4 years ago

I followed your readme and was able to load_graph_model() for bodypix CNN. however when I try to run the session on a sample image, I get the following error. Current implementation does not yet support dilations in the batch and depth dimensions. [[node resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/Relu (defined at /home/auv/anaconda3/envs/bodypix/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Is it because of the compatibility sissues between tensorflow1.x and tensorflow2.x? please help

patlevin commented 4 years ago

The error you encounter will occur with Tensorflow 2.0, too. It occurs, if a convolution layer contains dilation values other than [1, , , 1]. Now resnet50 doesn't contain any layers with dilation values other than [1, 1, 1, 1] (the default), which makes me curious which model you used.

I tried to to find the model in question and I stumbled upon this: Body-Pix

The resnet50 model they use is located at resnet50

Can you confirm that this is the model you used? I downloaded the model and ran it through a small test script and it worked without any issues:

import os
# make tensorflow stop spamming messages
os.environ['TF_CPP_MIN_LOG_LEVEL'] = "3"

import numpy as np
import tensorflow as tf
import tfjs_graph_converter as tfjs

print("Loading resnet50...", end="")
graph = tfjs.api.load_graph_model('~/tfjs/models/resnet50/') # downloaded from the link above
print("done.\nLoading sample image...", end="")
# load sample image into numpy array
img = tf.keras.preprocessing.image.load_img('~/tfjs/assets/sample.jpg')
x = tf.keras.preprocessing.image.img_to_array(img, dtype=np.float32)
# add imagenet mean - extracted from body-pix source
m = np.array([-123.15, -115.90, -103.06])
x = np.add(x, m)
sample_image = x[tf.newaxis, ...]
print("done.\nRunning inference...", end="")

# evaluate the loaded model directly
with tf.compat.v1.Session(graph=graph) as sess:
    input_tensor_names = tfjs.util.get_input_tensors(graph)
    output_tensor_names = tfjs.util.get_output_tensors(graph)
    input_tensor = graph.get_tensor_by_name(input_tensor_names[0])
    results = sess.run(output_tensor_names, feed_dict={input_tensor: sample_image})
print("done. {} outputs received".format(len(results))) # should be 8 outputs

ajaichemmanam commented 4 years ago

Same issue for me. I was trying to use Resnet 50 stride 16 posenet model

Tried running your model evaluation code: tensorflow.python.framework.errors_impl.InvalidArgumentError: Current implementation does not yet support dilations in the batch and depth dimensions.

I am using TF ver 1.15.0 and TFJS 1.4.0, python3.7 on mac

patlevin commented 4 years ago

@ajaichemmanam Thanks for providing a link to the model. Without even testing the model, I can tell just from looking at the json that the issue is with these 3 layers:

resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/BatchNorm/batchnorm_1/add_1/conv
resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/BatchNorm/batchnorm_1/add_1/conv
resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/BatchNorm/batchnorm_1/add_1/conv

It seems to me that this particular RESNET50 implementation simply isn't compatible with TF. The TF documentation confirms this: Dilations in the batch and depth dimensions if a 4-d tensor must be 1. This applies to TF 2.0 as well. The aforementioned Conv2D-layers have their dilation values set to [2, 2, 1, 1], which simply isn't compatible with TF 1.15 and TF 2.0 and there's nothing the converter can do about that.

Now here comes the interesting part. It seems as if this isn't even compatible with TFJS: dilationRate: Should be an integer or array of two integers.

@PushyamiKaveti @ajaichemmanam It seems to me as if the Google team messed up the format conversion while converting the model to TFJS. I will add a workaround later today. Expect an updated version in a couple of hours 👍

ajaichemmanam commented 4 years ago

@patlevin Thanks for the reply and an awesome work. I was also arriving at the same conclusion. The mobilenet version of posenet works well after conversion. So it is indeed the problem with resnet model due to tensorflow changes.

patlevin commented 4 years ago

Fixed in the current release

PushyamiKaveti commented 4 years ago

@patlevin Thank you so much for fixing it. Like @ajaichemmanam mentioned I was using the model with stride16 with dilation values of ["2", "2", "1", "1"] which dont appear in the model with stride32.

emredog commented 4 years ago

The fix also worked for me. Thank you @patlevin for this great project!

DCC-lzhy commented 4 years ago

The error you encounter will occur with Tensorflow 2.0, too. It occurs, if a convolution layer contains dilation values other than [1, , , 1]. Now resnet50 doesn't contain any layers with dilation values other than [1, 1, 1, 1] (the default), which makes me curious which model you used.

I tried to to find the model in question and I stumbled upon this: Body-Pix

The resnet50 model they use is located at resnet50

Can you confirm that this is the model you used? I downloaded the model and ran it through a small test script and it worked without any issues:
import os
# make tensorflow stop spamming messages
os.environ['TF_CPP_MIN_LOG_LEVEL'] = "3"

import numpy as np
import tensorflow as tf
import tfjs_graph_converter as tfjs

print("Loading resnet50...", end="")
graph = tfjs.api.load_graph_model('~/tfjs/models/resnet50/') # downloaded from the link above
print("done.\nLoading sample image...", end="")
# load sample image into numpy array
img = tf.keras.preprocessing.image.load_img('~/tfjs/assets/sample.jpg')
x = tf.keras.preprocessing.image.img_to_array(img, dtype=np.float32)
# add imagenet mean - extracted from body-pix source
m = np.array([-123.15, -115.90, -103.06])
x = np.add(x, m)
sample_image = x[tf.newaxis, ...]
print("done.\nRunning inference...", end="")

# evaluate the loaded model directly
with tf.compat.v1.Session(graph=graph) as sess:
    input_tensor_names = tfjs.util.get_input_tensors(graph)
    output_tensor_names = tfjs.util.get_output_tensors(graph)
    input_tensor = graph.get_tensor_by_name(input_tensor_names[0])
    results = sess.run(output_tensor_names, feed_dict={input_tensor: sample_image})
print("done. {} outputs received".format(len(results))) # should be 8 outputs

Where could you find add imagenet mean - extracted from body-pix source from source code? And the output part_heatmaps is 16-stride, how can I map to the original size? Is it bilinear interpolation(x16), then softmax and argmax, get the partId result at last?

patlevin commented 4 years ago

@DCC-lzhy

Where could you find add imagenet mean - extracted from body-pix source from source code?

It's the preprocessing step of the resnet50-model used by BodyPix. You can find it right here: resnet (line 22).

And the output part_heatmaps is 16-stride, how can I map to the original size? Is it bilinear interpolation(x16), then softmax and argmax, get the partId result at last?

The decoding process is described in the BodyPix source code, for example here: decode_single_pose.ts. Please understand that I cannot answer questions about particular models and how to use them. The authors have released the source code and detailed descriptions in a blog-post.

Good luck!

DCC-lzhy commented 4 years ago

@DCC-lzhy

Where could you find add imagenet mean - extracted from body-pix source from source code?

It's the preprocessing step of the resnet50-model used by BodyPix. You can find it right here: resnet (line 22).

And the output part_heatmaps is 16-stride, how can I map to the original size? Is it bilinear interpolation(x16), then softmax and argmax, get the partId result at last?

The decoding process is described in the BodyPix source code, for example here: decode_single_pose.ts. Please understand that I cannot answer questions about particular models and how to use them. The authors have released the source code and detailed descriptions in a blog-post.

Good luck!

Thank you very much.

patlevin / tfjs-to-tf

unable to load resnet50 bodypix model #1