Closed eawer closed 1 year ago
I faced the same issue today and managed to track it down to this function in yolo_v8_label_encoder.py
, the issue being that for ragged tensors this line:
bs, n_boxes, _ = gt_bboxes.shape
will return n_boxes = None
, so the reshape in this following lines will fail:
n_anchors = xy_centers.shape[0]
bs, n_boxes, _ = gt_bboxes.shape
left_top, right_bottom = tf.split(
tf.reshape(gt_bboxes, (-1, 1, 4)), 2, axis=-1
)
bbox_deltas = tf.reshape(
tf.concat(
[
xy_centers[tf.newaxis] - left_top,
right_bottom - xy_centers[tf.newaxis],
],
axis=2,
),
(-1, n_boxes, n_anchors, 4),
)
Not sure what would be the right way to fix this
converting the bounding box labels to dense in the compute_loss
method of YOLOV8Detector
fixes the issue for me:
def compute_loss(self, x, y, box_pred, cls_pred):
pred_boxes = decode_regression_to_boxes(box_pred)
pred_scores = cls_pred
y_for_encoder = bounding_box.to_dense(y)
anchor_points, stride_tensor = get_anchors(image_shape=x.shape[1:])
stride_tensor = tf.expand_dims(stride_tensor, axis=-1)
gt_labels = y_for_encoder["classes"]
mask_gt = tf.reduce_all(y_for_encoder["boxes"] > -1.0, axis=-1, keepdims=True)
gt_bboxes = bounding_box.convert_format(
y_for_encoder["boxes"],
source=self.bounding_box_format,
target="xyxy",
images=x,
)
...
# and so on
but for some reason this only work in eager mode, i.e. when I set:
tf.config.experimental_run_functions_eagerly(True)
When graph mode is on, i.e. when I call model.fit
I still get the same error.
converting the bounding box labels to dense in the compute_loss method of YOLOV8Detector fixes the issue for me:
This is probably the right fix. I will take a look at this today and see how we sorted this out for RetinaNet
. Thanks for the issue report!
After a little digging, here's my take:
So what I'm going to do is the following:
YOLOV8Detector could convert Ragged inputs to Dense, but to do so it has to know what the maximum number of boxes is for a batch of images. This gets a little hairy because if we were to assume that e.g. 64 might be a reasonable number, we're adding a ton of extra padding for some datasets and probably missing a bunch of boxes in other datasets.
The label encoder would some meaningful work to support Ragged tensors, and I don't have the bandwidth to do this right now
Hi @ianstenbit
I have investigated this issue and would like to share my thoughts on it:
Instead of converting a ragged tensor into a dense tensor in YOLOV8Detector, we can do it in YOLOV8LabelEncoder. This approach should be identical to the official implementation:
i = targets[:, 0] # image index
_, counts = i.unique(return_counts=True)
counts = counts.to(dtype=torch.int32)
out = torch.zeros(batch_size, counts.max(), 5, device=self.device) # <- this line
for j in range(batch_size):
matches = i == j
n = matches.sum()
if n:
out[j, :n] = targets[matches, 1:]
out[..., 1:5] = xywh2xyxy(out[..., 1:5].mul_(scale_tensor))
and current implementation of RetinaNetLabelEncoder:
I have successfully run the script provided by @eawer after making modifications to YOLOV8LabelEncoder
.
Colab link:
https://colab.research.google.com/drive/11NIMR1HJhbl-_d60ij12uRsqLYieqbvG?usp=sharing
I can submit the PR if it would be helpful.
Fantastic -- thanks for digging into this and thanks for the PR! I'll take a look now
TF version: 2.12 (nvcr.io/nvidia/tensorflow:23.04-tf2-py3 container) keras_cv version: 0.5.0 host: Ubuntu 22.04.2 LTS
YoloV8 works as expected with just plain tensors, but when it comes to ragged tensors (which are pretty common in OD task) it starts to fail code:
error: