guigzzz / Keras-Yolo-v2

Keras re-implementation of Yolo v2 Object Detection
29 stars 8 forks source link

Loss function input shape #5

Open ghost opened 4 years ago

ghost commented 4 years ago

Hello, i was wondering why the input shape to loss function is (13, 13, 5, 25), and you opt to leave out batch dimension, or am i looking at it wrong? I see ProcessGroundTruth returns y_true of shape (13, 13, 5, 25) which is input to loss function. In general, y_true and y_pred in loss function are going to have batch_size as their first dimension, or in this case number_of_samples/steps_per_epoch.

guigzzz commented 4 years ago

So there's the following example in the README:

from yolo_v2 import YOLOV2_ANCHOR_PRIORS as priors
from yolov2_train import processGroundTruth

image = imread(image_path)
boxes, labels = fetch_bounding_boxes_and_labels()

y_true = processGroundTruth(boxes, labels, priors, (13, 13, 5, 25))
trainnet.m.fit(image[None], y_true[None], steps_per_epoch=30, epochs=10)

Your analysis of the code is completely correct, processGroundTruth returns a numpy array of shape (13, 13, 5, 25), but then I pass it to the model's fit function by doing y_true[None], which will convert it to (1, 13, 13, 5, 25).

so what you should be able to do is build a list of (bounding boxes, labels) pairs (called y in the following code) and then do:

y_true = np.asarray([
    processGroundTruth(boxes, labels, priors, (13, 13, 5, 25))
    for boxes, labels in y
])

and then do trainnet.m.fit(images, y_true, steps_per_epoch=30, epochs=10), where images is an array of images.

does this clarify things?