Prediction for different image size

vinodrajendran001 commented 6 years ago

The image height and width I gave is (416,416).

My original image is varying size.

Thereby during prediction for a single image, I resized the image as follows

import numpy as np
from numpy import newaxis
import cv2

print ("X shape" ,X.shape)

img = cv2.resize(X[i], dsize = (416,416)) 

img = np.expand_dims(img, axis=-1)

im = np.rollaxis(img,3)

print ("im resized shape", im.shape)

X shape (1, 1131, 2488, 3)
im resized shape (1, 416, 416, 3)

Then I did the predict and decode_y2 for which the empty list is returned.

Decoded predictions (output format is [class_id, confidence, xmin, ymin, xmax, ymax]):

[]

Am I missing something for doing prediction?

pierluigiferrari commented 6 years ago

Once again, this isn't enough information to find the cause of the problem. For instance, a critical piece of information in this case would be: What model are you even trying to make predictions with? Is it one of the trained models? Is it a model you trained from scratch? Is it one of the trained models that you then fine-tuned on some dataset?

There are tutorial notebooks for inference for both SSD300 and SSD512 in which you need to change pretty much nothing but the image size and the paths to your images, why don't you just use one of those?

For example, in your code above, one thing that's possibly wrong is that you're swapping the color channel axes of your input image, but SSD300 and SSD512 swap the image color channels internally by default, so unless you turned that off, your input will end up being changed back from BGR to RGB, which is exactly not what you want. This can definitely not be the whole reason why you're not getting any predictions, but it's one of potentially many things that could be wrong if you write your own code instead of using the already working code that's there.

Take one of the inference tutorial notebooks. Set your desired image size. Make a prediction. If it works, good. If you get nothing, you know that the problem is that the model doesn't make any confident predictions. Then, step by step, make the changes you need to the code. After every single change, make a prediction to see if it's still working. Once it breaks, you will know exactly where it breaks and why.

vinodrajendran001 commented 6 years ago

I am using ss7 training.ipynb as a reference and in that I am building the model from scratch. If you notice that, during prediction phase i.e

y_pred = model.predict(X)

for varying size images this will throw error because the model expects a 4d input but the X from predict generatorreturns 4d input of original image size.

Since my input is varying size, for each of next(predict generator) the X[i] will be different. I guess in your example you resized all your images to same size before training. So for varying size, I believe we need to resize the image before giving it for predict().

pierluigiferrari commented 6 years ago

You can feed model.predict() images of different sizes, but all images within one batch must have the same size because tensors must have homogenous dimensions. The same goes for BatchGenerator.generate(): It can serve images of different sizes, but all images within one batch must have the same size because Numpy arrays must have homogenous dimensions. If you generate a batch where the images have different sizes and you don't use any of BatchGenerator.generate()'s options to make the size homogenous, then the generated batch will be empty.

The question is why you would even want to predict on images of different sizes. Your network has been trained on images of a certain size, and the prediction quality will be highest when you make predictions on images of that size. The anchor box distribution and the receptive fields of the predictor layers all depend on the input image size. You would usually use your model to predict on images of the same size the model has been trained for only and you would use the identical process to resize the images that you used during training, i.e. if you used random padding during training, you use random padding during inference, or if you used simple resizing during training, you use simple resizing during inference. The point is to make the input for inference as similar as possible structurally to the input for training.

vinodrajendran001 commented 6 years ago

This issue was raised based on the assumption that this code works like darknet. Now I realize that its different and it works well, when we resize the image and their corresponding annotations before feeding it for training.

pierluigiferrari / ssd_keras

Prediction for different image size #57