YOLOV8 does not work with ragged tensors

eawer commented 1 year ago

TF version: 2.12 (nvcr.io/nvidia/tensorflow:23.04-tf2-py3 container) keras_cv version: 0.5.0 host: Ubuntu 22.04.2 LTS

YoloV8 works as expected with just plain tensors, but when it comes to ragged tensors (which are pretty common in OD task) it starts to fail code:

images = tf.ones(shape=(1, 512, 512, 3))
boxes = tf.RaggedTensor.from_tensor(
    tf.convert_to_tensor(
        [
            [
                [0, 0, 100, 100],
                [100, 100, 200, 200],
                [300, 300, 100, 100],
            ]
        ],
        dtype=tf.float32,
    )
)
classes = tf.RaggedTensor.from_tensor(
    tf.convert_to_tensor([[1, 1, 1]], dtype=tf.float32)
)

labels = {
    "boxes": boxes,
    "classes": classes,
}

model = keras_cv.models.YOLOV8Detector(
    num_classes=1,
    backbone=keras_cv.models.YOLOV8Backbone.from_preset("yolo_v8_m_backbone"),
    fpn_depth=2,
    bounding_box_format="xywh",
)
model.compile(
    optimizer='adam', 
    box_loss="iou", 
    classification_loss="binary_crossentropy"
)
model.fit({
    "images": images,
    "bounding_boxes": labels,
})

error:

TypeError                                 Traceback (most recent call last)
Cell In[47], line 31
     20 model = keras_cv.models.YOLOV8Detector(
     21     num_classes=1,
     22     backbone=keras_cv.models.YOLOV8Backbone.from_preset("yolo_v8_m_backbone"),
     23     fpn_depth=2,
     24     bounding_box_format="xywh",
     25 )
     26 model.compile(
     27     optimizer='adam', 
     28     box_loss="iou", 
     29     classification_loss="binary_crossentropy"
     30 )
---> 31 model.fit({
     32     "images": images,
     33     "bounding_boxes": labels,
     34 })

File /usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File /tmp/__autograph_generated_fileepyb8h9n.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator)
     13 try:
     14     do_return = True
---> 15     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
     16 except:
     17     do_return = False

File /tmp/__autograph_generated_file40_5wv7z.py:12, in outer_factory.<locals>.inner_factory.<locals>.tf__call(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes, mask_gt)
     10 retval_ = ag__.UndefinedReturnValue()
     11 max_num_boxes = ag__.ld(gt_bboxes).shape[1]
---> 12 (mask_pos, align_metric, overlaps) = ag__.converted_call(ag__.ld(self).get_pos_mask, (ag__.ld(pd_scores), ag__.ld(pd_bboxes), ag__.ld(gt_labels), ag__.ld(gt_bboxes), ag__.ld(anc_points), ag__.ld(mask_gt), ag__.ld(max_num_boxes)), None, fscope)
     13 (target_gt_idx, fg_mask, mask_pos) = ag__.converted_call(ag__.ld(select_highest_overlaps), (ag__.ld(mask_pos), ag__.ld(overlaps), ag__.ld(max_num_boxes)), None, fscope)
     14 (target_bboxes, target_scores) = ag__.converted_call(ag__.ld(self).get_targets, (ag__.ld(gt_labels), ag__.ld(gt_bboxes), ag__.ld(target_gt_idx), ag__.ld(fg_mask), ag__.ld(max_num_boxes)), None, fscope)

File /tmp/__autograph_generated_filenlsdsmed.py:11, in outer_factory.<locals>.inner_factory.<locals>.tf__get_pos_mask(self, pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points, mask_gt, max_num_boxes)
      9 do_return = False
     10 retval_ = ag__.UndefinedReturnValue()
---> 11 mask_in_gts = ag__.converted_call(ag__.ld(select_candidates_in_gts), (ag__.ld(anc_points), ag__.ld(gt_bboxes)), None, fscope)
     12 (align_metric, overlaps) = ag__.converted_call(ag__.ld(self).get_box_metrics, (ag__.ld(pd_scores), ag__.ld(pd_bboxes), ag__.ld(gt_labels), ag__.ld(gt_bboxes), (ag__.converted_call(ag__.ld(tf).cast, (ag__.ld(mask_in_gts), ag__.ld(tf).int32), None, fscope) * ag__.converted_call(ag__.ld(tf).cast, (ag__.ld(mask_gt), ag__.ld(tf).int32), None, fscope)), ag__.ld(max_num_boxes)), None, fscope)
     13 mask_topk = ag__.converted_call(ag__.ld(self).select_topk_candidates, (ag__.ld(align_metric),), dict(topk_mask=ag__.converted_call(ag__.ld(tf).cast, (ag__.converted_call(ag__.ld(tf).repeat, (ag__.ld(mask_gt), ag__.ld(self).max_anchor_matches), dict(axis=2), fscope), ag__.ld(tf).bool), None, fscope)), fscope)

File /tmp/__autograph_generated_fileu050c945.py:14, in outer_factory.<locals>.inner_factory.<locals>.tf__select_candidates_in_gts(xy_centers, gt_bboxes, epsilon)
     12 (bs, n_boxes, _) = ag__.ld(gt_bboxes).shape
     13 (left_top, right_bottom) = ag__.converted_call(ag__.ld(tf).split, (ag__.converted_call(ag__.ld(tf).reshape, (ag__.ld(gt_bboxes), ((- 1), 1, 4)), None, fscope), 2), dict(axis=(- 1)), fscope)
---> 14 bbox_deltas = ag__.converted_call(ag__.ld(tf).reshape, (ag__.converted_call(ag__.ld(tf).concat, ([(ag__.ld(xy_centers)[ag__.ld(tf).newaxis] - ag__.ld(left_top)), (ag__.ld(right_bottom) - ag__.ld(xy_centers)[ag__.ld(tf).newaxis])],), dict(axis=2), fscope), ((- 1), ag__.ld(n_boxes), ag__.ld(n_anchors), 4)), None, fscope)
     15 try:
     16     do_return = True

TypeError: in user code:

    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1284, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1268, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1249, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.8/dist-packages/keras_cv/models/object_detection/yolo_v8/yolo_v8_detector.py", line 513, in train_step
        total_loss = self.compute_loss(x, y, box_pred, cls_pred)
    File "/usr/local/lib/python3.8/dist-packages/keras_cv/models/object_detection/yolo_v8/yolo_v8_detector.py", line 550, in compute_loss
        target_bboxes, target_scores, fg_mask = self.label_encoder(
    File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_file40_5wv7z.py", line 12, in tf__call
        (mask_pos, align_metric, overlaps) = ag__.converted_call(ag__.ld(self).get_pos_mask, (ag__.ld(pd_scores), ag__.ld(pd_bboxes), ag__.ld(gt_labels), ag__.ld(gt_bboxes), ag__.ld(anc_points), ag__.ld(mask_gt), ag__.ld(max_num_boxes)), None, fscope)
    File "/tmp/__autograph_generated_filenlsdsmed.py", line 11, in tf__get_pos_mask
        mask_in_gts = ag__.converted_call(ag__.ld(select_candidates_in_gts), (ag__.ld(anc_points), ag__.ld(gt_bboxes)), None, fscope)
    File "/tmp/__autograph_generated_fileu050c945.py", line 14, in tf__select_candidates_in_gts
        bbox_deltas = ag__.converted_call(ag__.ld(tf).reshape, (ag__.converted_call(ag__.ld(tf).concat, ([(ag__.ld(xy_centers)[ag__.ld(tf).newaxis] - ag__.ld(left_top)), (ag__.ld(right_bottom) - ag__.ld(xy_centers)[ag__.ld(tf).newaxis])],), dict(axis=2), fscope), ((- 1), ag__.ld(n_boxes), ag__.ld(n_anchors), 4)), None, fscope)

    TypeError: Exception encountered when calling layer 'yolov8_label_encoder_14' (type YOLOV8LabelEncoder).

    in user code:

        File "/usr/local/lib/python3.8/dist-packages/keras_cv/models/object_detection/yolo_v8/yolo_v8_label_encoder.py", line 167, in call  *
            mask_pos, align_metric, overlaps = self.get_pos_mask(
        File "/usr/local/lib/python3.8/dist-packages/keras_cv/models/object_detection/yolo_v8/yolo_v8_label_encoder.py", line 235, in get_pos_mask  *
            mask_in_gts = select_candidates_in_gts(anc_points, gt_bboxes)
        File "/usr/local/lib/python3.8/dist-packages/keras_cv/models/object_detection/yolo_v8/yolo_v8_label_encoder.py", line 78, in select_candidates_in_gts  *
            bbox_deltas = tf.reshape(

        TypeError: Failed to convert elements of (-1, None, 5376, 4) to Tensor. Consider casting elements to a supported type. See https://www.tensorflow.org/api_docs/python/tf/dtypes for supported TF dtypes.

    Call arguments received by layer 'yolov8_label_encoder_14' (type YOLOV8LabelEncoder):
      • pd_scores=tf.Tensor(shape=(None, 5376, 1), dtype=float32)
      • pd_bboxes=tf.Tensor(shape=(None, 5376, 4), dtype=float32)
      • anc_points=tf.Tensor(shape=(5376, 2), dtype=float32)
      • gt_labels=tf.RaggedTensor(values=Tensor("RaggedFromVariant_1/RaggedTensorFromVariant:1", shape=(None,), dtype=float32), row_splits=Tensor("RaggedFromVariant_1/RaggedTensorFromVariant:0", shape=(None,), dtype=int64))
      • gt_bboxes=tf.RaggedTensor(values=Tensor("RaggedConcat/concat:0", shape=(None, 4), dtype=float32), row_splits=Tensor("RaggedSplit/RaggedGetItem/RaggedFromUniformRowLength/control_dependency:0", shape=(None,), dtype=int64))
      • mask_gt=tf.RaggedTensor(values=Tensor("RaggedReduceAll/Cast_1:0", shape=(None, 1), dtype=bool), row_splits=Tensor("RaggedFromVariant/RaggedTensorFromVariant:0", shape=(None,), dtype=int64))

giuliano-97 commented 1 year ago

I faced the same issue today and managed to track it down to this function in yolo_v8_label_encoder.py, the issue being that for ragged tensors this line:

bs, n_boxes, _ = gt_bboxes.shape

will return n_boxes = None, so the reshape in this following lines will fail:

    n_anchors = xy_centers.shape[0]
    bs, n_boxes, _ = gt_bboxes.shape

    left_top, right_bottom = tf.split(
        tf.reshape(gt_bboxes, (-1, 1, 4)), 2, axis=-1
    )
    bbox_deltas = tf.reshape(
        tf.concat(
            [
                xy_centers[tf.newaxis] - left_top,
                right_bottom - xy_centers[tf.newaxis],
            ],
            axis=2,
        ),
        (-1, n_boxes, n_anchors, 4),
    )

Not sure what would be the right way to fix this

giuliano-97 commented 1 year ago

converting the bounding box labels to dense in the compute_loss method of YOLOV8Detector fixes the issue for me:

    def compute_loss(self, x, y, box_pred, cls_pred):
        pred_boxes = decode_regression_to_boxes(box_pred)
        pred_scores = cls_pred

        y_for_encoder = bounding_box.to_dense(y)

        anchor_points, stride_tensor = get_anchors(image_shape=x.shape[1:])
        stride_tensor = tf.expand_dims(stride_tensor, axis=-1)

        gt_labels = y_for_encoder["classes"]

        mask_gt = tf.reduce_all(y_for_encoder["boxes"] > -1.0, axis=-1, keepdims=True)
        gt_bboxes = bounding_box.convert_format(
            y_for_encoder["boxes"],
            source=self.bounding_box_format,
            target="xyxy",
            images=x,
        )
       ...
       # and so on

but for some reason this only work in eager mode, i.e. when I set:

tf.config.experimental_run_functions_eagerly(True)

When graph mode is on, i.e. when I call model.fit I still get the same error.

ianstenbit commented 1 year ago

converting the bounding box labels to dense in the compute_loss method of YOLOV8Detector fixes the issue for me:

This is probably the right fix. I will take a look at this today and see how we sorted this out for RetinaNet. Thanks for the issue report!

ianstenbit commented 1 year ago

After a little digging, here's my take:

YOLOV8Detector could convert Ragged inputs to Dense, but to do so it has to know what the maximum number of boxes is for a batch of images. This gets a little hairy because if we were to assume that e.g. 64 might be a reasonable number, we're adding a ton of extra padding for some datasets and probably missing a bunch of boxes in other datasets.
The label encoder would some meaningful work to support Ragged tensors, and I don't have the bandwidth to do this right now

So what I'm going to do is the following:

Throw a much better user error when Raggeds are passed to the label encoder which directs users to a better workaround (#1844)
Keep this issue open with contribution_welcome incase anyone is interested in making the label encoder support ragged inputs.

james77777778 commented 1 year ago

YOLOV8Detector could convert Ragged inputs to Dense, but to do so it has to know what the maximum number of boxes is for a batch of images. This gets a little hairy because if we were to assume that e.g. 64 might be a reasonable number, we're adding a ton of extra padding for some datasets and probably missing a bunch of boxes in other datasets.

The label encoder would some meaningful work to support Ragged tensors, and I don't have the bandwidth to do this right now

Hi @ianstenbit

I have investigated this issue and would like to share my thoughts on it:

Instead of converting a ragged tensor into a dense tensor in YOLOV8Detector, we can do it in YOLOV8LabelEncoder. This approach should be identical to the official implementation:

https://github.com/ultralytics/ultralytics/blob/29c954a1385815e522f0d94d7144814cc4e6da42/ultralytics/yolo/utils/loss.py#L105-L120

            i = targets[:, 0]  # image index
            _, counts = i.unique(return_counts=True)
            counts = counts.to(dtype=torch.int32)
            out = torch.zeros(batch_size, counts.max(), 5, device=self.device)  # <- this line
            for j in range(batch_size):
                matches = i == j
                n = matches.sum()
                if n:
                    out[j, :n] = targets[matches, 1:]
            out[..., 1:5] = xywh2xyxy(out[..., 1:5].mul_(scale_tensor))

and current implementation of RetinaNetLabelEncoder:

https://github.com/keras-team/keras-cv/blob/66fa74b6a2a0bb1e563ae8bce66496b118b95200/keras_cv/models/object_detection/retinanet/retinanet_label_encoder.py#L196

I have successfully run the script provided by @eawer after making modifications to YOLOV8LabelEncoder. Colab link: https://colab.research.google.com/drive/11NIMR1HJhbl-_d60ij12uRsqLYieqbvG?usp=sharing

I can submit the PR if it would be helpful.

ianstenbit commented 1 year ago

Fantastic -- thanks for digging into this and thanks for the PR! I'll take a look now

keras-team / keras-cv

YOLOV8 does not work with ragged tensors #1832