keras-team / keras-cv

Industry-strength Computer Vision workflows with Keras
Other
1.01k stars 330 forks source link

Division of data into training and validation set & COCO Metric Callback not working with Keras CV implementation as expected #2137

Open Inshu32 opened 11 months ago

Inshu32 commented 11 months ago

Discussed in https://github.com/keras-team/keras-cv/discussions/2126

Originally posted by **Inshu32** November 6, 2023 I am trying to implement Keras-cv based pipeline to train a custom dataset using https://keras.io/examples/vision/yolov8/ example. I have an object detection problem with 6 classes. I am facing two issues: _1. While dividing the dataset using take and skip, since the data is divided sequentially it takes first 2 classes for validation and rest 4 training. This is creating problems as data is being trained on different data and tested on different dataset. I used tf.data.shuffle to overcome this problem but still division of dataset doesn't ensure that all the classes are represented in both training and val set. 2. While running Yolo.fit, I expect the algorithm to evaluate the predictions using COCO metric call back. for which I am using teh following function:_ class EvaluateCOCOMetricsCallback(keras.callbacks.Callback): def __init__(self, data, save_path): super().__init__() self.data = data self.metrics = keras_cv.metrics.BoxCOCOMetrics( bounding_box_format="xyxy", evaluate_freq=1e9, ) self.save_path = save_path self.best_map = -1.0 def on_epoch_end(self, epoch, logs): self.metrics.reset_state() for batch in self.data: images, y_true = batch[0], batch[1] y_pred = self.model.predict(images, verbose=0) self.metrics.update_state(y_true, y_pred) metrics = self.metrics.result(force=True) logs.update(metrics) current_map = metrics["MaP"] if current_map > self.best_map: self.best_map = current_map self.model.save(self.save_path) # Save the model when mAP improves return logs Which produces the following error: **tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__ConcatV2_N_13_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [32,2,4] vs. shape[2] = [32,1,4] [Op:ConcatV2] name: concat More detailed traceback {File "/home/lib/python3.9/site-packages/keras_cv/metrics/object_detection/box_coco_metrics.py", line 262, in _compute_result _box_concat(self.ground_truths), File "/home/lib/python3.9/site-packages/keras_cv/metrics/object_detection/box_coco_metrics.py", line 44, in _box_concat result[key] = tf.concat([b[key] for b in boxes], axis=0) }** _**This to my understanding is problem with multiple bounding boxes in one image. Ragged tensor solves the problem while training with multiple bounding boxes. In the above case I think predicted bounding box is only one and ground truth has 2 bounding boxes for the same image. How to solve this problem ?**_
luisliborio commented 10 months ago

I am having a similar (maybe equal) problem with keras_cv.metrics.BoxCOCOMetrics().

The problem happens in the call self.metrics.result(force=True) and it would run properly in the keras_cv==0.5.1

reproducing the tutorial in keras yolov8 detection

MachKoder commented 10 months ago

+1 to this issue. @luisliborio you seem to be right. The issue happens on the call to self.metrics.results. I am currently using keras_cv==0.6.4. It seems that BoxCOCOMetrics.result requires the bounding_box tensor to be dense. I have changed the dict_to_tuple function to

def dict_to_tuple(inputs):
    return inputs["images"], bounding_box.to_dense(
        inputs["bounding_boxes"], max_boxes=32
    )

However it is not clear to me how changing the bounding_box tensor from ragged to dense affects training if at all.

EDIT: I tried running my code with keras-cv==0.5.1 using the original dict_to_tuple function as well as the one I posted above. When using the original dict_to_tuple with keras-cv==0.5.1 it works. If I use the version I posted above, I get the same error.

MachKoder commented 10 months ago

@LukeWood's tutorial on Object Detection with KerasCV highlights the use of dense tensors but only necessary when using TPU's. As a matter of fact, he uses the BoxCOCOMetrics as an argument during the model compilation when on GPU. However the YoloV8Detector does not support adding a metric as an argument.

@LukeWood does converting a ragged tensor to a dense affect training and/or inference? Also can you take a look at the issue that is being described above? Thank you!

Inshu32 commented 10 months ago

I still donot understand that what should I do to make to resolve the callback metric issue. Surprisingly, it is only arising when there are multiple bounding boxes in the labelled dataset. When I changed my data and used only single bounding box per image, its not giving the concat error.

MachKoder commented 10 months ago

I still donot understand that what should I do to make to resolve the callback metric issue. Surprisingly, it is only arising when there are multiple bounding boxes in the labelled dataset. When I changed my data and used only single bounding box per image, its not giving the concat error.

So you have two options:

  1. Downgrade your keras-cv version to release 0.5.1 and the original dict_to_tupl function will work.
  2. Keep your current version of keras-cv (assuming it is > 0.5.1) and use the modified dict_to_tuple function that I posted above, ie, convert the ragged tensor to dense.
AlexanderCoudijzer commented 7 months ago

I was having a similar error, using keras-cv 0.8.2.
My understanding is that the issue is not multiple boxes in an image, but the concatenation of boxes for multiple images in a batch. In the code for box_coco_metrics.py the _box_concat() function has the following line:

    for key in ["boxes", "classes"]:
        result[key] = tf.concat([b[key] for b in boxes], axis=0)

I think that should be axis=1 because that's the dimension for the number of boxes.