FasterRCNN fails with smaller image sizes

tomrtk commented 1 year ago

Hi. Is it possible to use the FasterRCNN model with smaller image sizes?

If changing to image_size = [224, 224, 3] in examples/training/object_detection/pascal_voc/faster_rcnn.py it fails as in traceback bellow. I did some tests with different image sizes, but the smallest I got training was with a size of 512.

While trying to find out if it was something wrong in my own code, I put a breakpoints in decode_single_level and the shape of encoded_anchor and box_delta looked correct for multiple calls, until I get the shapes bellow.

Any input on this is appreciated.

Traceback

Traceback (most recent call last):
  File ".../code/faster_rcnn.py", line 325, in <module>
    model.fit(train_ds, epochs=18, validation_data=eval_ds, callbacks=callbacks)
  File "..../code/venv/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/run/user/1000/__autograph_generated_filei5_e5zbp.py", line 15, in tf__train_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
ValueError: in user code:

    File ".../code/venv/lib/python3.9/site-packages/keras/engine/training.py", line 1249, in train_function  *
        return step_function(self, iterator)
    File ".../code/venv/lib/python3.9/site-packages/keras/engine/training.py", line 1233, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File ".../code/venv/lib/python3.9/site-packages/keras/engine/training.py", line 1222, in run_step  **
        outputs = model.train_step(data)
    File ".../code/venv/lib/python3.9/site-packages/keras_cv/models/object_detection/faster_rcnn.py", line 501, in train_step
        total_loss = self.compute_loss(images, gt_boxes, gt_classes, training=True)
    File ".../code/venv/lib/python3.9/site-packages/keras_cv/models/object_detection/faster_rcnn.py", line 456, in compute_loss
        rois, feature_map, rpn_box_pred, rpn_cls_pred = self._call_rpn(
    File ".../code/venv/lib/python3.9/site-packages/keras_cv/models/object_detection/faster_rcnn.py", line 342, in _call_rpn
        decoded_rpn_boxes = _decode_deltas_to_boxes(
    File ".../code/venv/lib/python3.9/site-packages/keras_cv/bounding_box/converters.py", line 111, in _decode_deltas_to_boxes
        boxes[lvl] = decode_single_level(anchor, boxes_delta[lvl])
    File ".../code/venv/lib/python3.9/site-packages/keras_cv/bounding_box/converters.py", line 100, in decode_single_level
        box_delta[..., :2] * encoded_anchor[..., 2:] + encoded_anchor[..., :2],

    ValueError: Dimensions must be equal, but are 27 and 48 for '{{node mul_43}} = Mul[T=DT_FLOAT](strided_slice_32, strided_slice_33)' with input shapes: [4,27,2], [48,2].

LukeWood commented 1 year ago

Hey @tomrtk ! Thanks for the bug report. Its good to gather information about where our models fail.

For the purpose of understanding, can you try 256x256?

tomrtk commented 1 year ago

Hi, with a size of 256x256 it seems to be working.

From debugging, it seems to be a difference in shape between anchors and boxes_delta at the last level here:

https://github.com/keras-team/keras-cv/blob/32701437b713050de951351529335f1fc725ceb0/keras_cv/bounding_box/converters.py#L111

if running the FasterRCNN example with size 224x224 and a breakpoint at line 111 in file above, I get the following output of shapes:

(Pdb) pp [(a.shape, b.shape) for a, b in zip(anchors.values(), boxes_delta.values())]
[(TensorShape([9408, 4]), TensorShape([4, 9408, 4])),
 (TensorShape([2352, 4]), TensorShape([4, 2352, 4])),
 (TensorShape([588, 4]), TensorShape([4, 588, 4])),
 (TensorShape([147, 4]), TensorShape([4, 147, 4])),
 (TensorShape([48, 4]), TensorShape([4, 27, 4]))]

so looks to be a issue with anchor generation or box delta calculation at last level(c6)?

tomrtk commented 1 year ago

I did some more digging and testing, and one issue I have found with the anchor generator is this:

test code:

from keras_cv.layers.object_detection.anchor_generator import _SingleAnchorGenerator
from keras_cv.layers.object_detection.anchor_generator import AnchorGenerator

# test all default values from FasterRCNN implementation
strides = {i: 2**i for i in range(2, 7)}
sizes = {2: 32.0, 3: 64.0, 4: 128.0, 5: 256.0, 6: 512.0}

for layer_size, stride in zip(sizes.values(), strides.values()):
    anchor_generator = _SingleAnchorGenerator(
        bounding_box_format='yxyx',
        sizes=layer_size,
        scales=[1],
        aspect_ratios=[0.5, 1.0, 2.0],
        stride=stride,
        clip_boxes=True,
    )
    # from docstring expected shape: `(H/stride * W/stride * len(scales) * len(aspect_ratios), 4)`
    expected_shape = ((224/stride) * (224/stride) * 1 * 3, 4)
    res = anchor_generator(image_size=(224, 224, 3))
    print(f'Got shape: {res.shape}, expected: {expected_shape}')
    # fail on last layer

# test of AnchorGenerator

anchor_generator = AnchorGenerator(
    bounding_box_format='yxyx',
    sizes=sizes,
    scales=[1],
    aspect_ratios=[0.5, 1.0, 2.0],
    strides=strides,
    clip_boxes=True,
)

for key, value in anchor_generator(image_shape=(224, 224, 3)).items():
    print(f'level {key} got shape {value.shape}')

gives output

Got shape: (9408, 4), expected: (9408.0, 4)
Got shape: (2352, 4), expected: (2352.0, 4)
Got shape: (588, 4), expected: (588.0, 4)
Got shape: (147, 4), expected: (147.0, 4)
Got shape: (48, 4), expected: (36.75, 4)
level 2 got shape (9408, 4)
level 3 got shape (2352, 4)
level 4 got shape (588, 4)
level 5 got shape (147, 4)
level 6 got shape (48, 4)

looks like the resulting shape of anchors are not as expected if image size divided by stride is not an int, in this case for last level stride is 64, and 224/64=3.5, but the expected value does still not correspond with box delta shape above.

tomrtk commented 1 year ago

after playing some with the expected values, it looks like the box delta shape is based on int division and anchor box shape based on round or ceil

(Pdb) from math import ceil
(Pdb) (ceil(224/64) * ceil(224/64) * 1 * 3, 4)
(48, 4)
(Pdb) ((224//64) * (224//64) * 1 * 3, 4)
(27, 4)

tomrtk commented 1 year ago

unsure if it is the anchor generator or the box delta shape is the correct one -- if anchor generator is wrong, I found a possible fix that makes the shapes the same, by removing +_1 in cx and cy here:

https://github.com/keras-team/keras-cv/blob/32701437b713050de951351529335f1fc725ceb0/keras_cv/layers/object_detection/anchor_generator.py#L252

bhack commented 1 year ago

Yes, generally I think that it is a bit risky to test only a single fixed input shape in the "integration test": https://github.com/keras-team/keras-cv/blob/32701437b713050de951351529335f1fc725ceb0/keras_cv/models/object_detection/faster_rcnn_test.py#L26

But you can also try to extend the test to cover your case in the Anchor generator test: https://github.com/keras-team/keras-cv/blob/master/keras_cv/layers/object_detection/anchor_generator_test.py

Can you try to propose a PR?

tomrtk commented 1 year ago

Can you try to propose a PR?

Sure, I will make PR

keras-team / keras-cv

FasterRCNN fails with smaller image sizes #1285

Traceback