tensorflow / io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Apache License 2.0
708 stars 288 forks source link

`tfio.experimental.image.draw_bounding_boxes` has inconsistent shape constraints #2046

Open AndreiMoraru123 opened 3 months ago

AndreiMoraru123 commented 3 months ago

DrawBoundingBoxesV3Op can essentially only draw one text output per image

I think this issue got first mentioned in https://github.com/tensorflow/io/issues/1088.

However, it got labelled as an enhancement, though it looks more like an imposed limitation in the implementation as it currently is.

A simple example that works is having a single box inside a single image, with a single color code and a single text output:

import tensorflow as tf
import tensorflow_io as tfio

width = 560
height = 320
channels = 3

images = tf.random.uniform((height, width, channels), dtype=tf.float32)
images = tf.expand_dims(images, axis=0)

boxes = tf.constant([[[0.1, 0.2, 0.5, 0.9]]], dtype=tf.float32)
texts = tf.constant(["hello_world!"], dtype=tf.string)
colors = tf.constant([[255, 0, 0]], dtype=tf.float32)

print("Shapes of inputs:")
print("Images:", images.shape)
print("Boxes:", boxes.shape)
print("Texts:", texts.shape)
print("Colors:", colors.shape)

output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
print("Output:", output.shape)
Shapes of inputs:
Images: (1, 320, 560, 3)
Boxes: (1, 1, 4)
Texts: (1,)
Colors: (1, 3)
Output: (1, 320, 560, 3)

We already know from tensorflow/io/tensorflow_io/core/kernels/image_font_kernels.cc that there is no point in trying without a batch dimension, as there is a check for the image rank to be 4:

OP_REQUIRES(context, images.dims() == 4, 
            errors::InvalidArgument("The rank of the images should be 4"));

This is also what the https://github.com/tensorflow/io/pull/254 PR by @yongtang that added this feature demonstrates as well

It's also spiritually the same as the one test available in the code at tensorflow/io/tests/test_image.py

Now, still within a batch size of 1 (one image), we could have more boxes, each with their own text labels and colors, but this does not work:

import tensorflow as tf
import tensorflow_io as tfio

width = 560
height = 320
channels = 3

images = tf.random.uniform((height, width, channels), dtype=tf.float32)
images = tf.expand_dims(images, axis=0)

boxes = tf.constant([[[0.1, 0.2, 0.5, 0.9], [0.3, 0.3, 0.6, 0.6]]], dtype=tf.float32)
texts = tf.constant(["hello_world!", "hello_world_part_2"], dtype=tf.string)
colors = tf.constant([[255, 0, 0], [0, 255, 0]], dtype=tf.float32)

print("Shapes of inputs:")
print("Images:", images.shape)
print("Boxes:", boxes.shape)
print("Texts:", texts.shape)
print("Colors:", colors.shape)

output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
print("Output:", output.shape)
Shapes of inputs:
Images: (1, 320, 560, 3)
Boxes: (1, 2, 4)
Texts: (2,)
Colors: (2, 3)
InvalidArgumentError                      Traceback (most recent call last)
[<ipython-input-1-3eebb25ba193>](https://localhost:8080/#) in <cell line: 22>()
     20 print("Colors:", colors.shape)
---> 22 output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
     23 print("Output:", output.shape)

1 frames
<string> in io_draw_bounding_boxes_v3(images, boxes, colors, texts, font_size, name)

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py](https://localhost:8080/#) in raise_from_not_ok_status(e, name)
   5881 def raise_from_not_ok_status(e, name) -> NoReturn:
   5882   e.message += (" name: " + str(name if name is not None else ""))
-> 5883   raise core._status_to_exception(e) from None  # pylint: disable=protected-access

InvalidArgumentError: {{function_node __wrapped__IO>DrawBoundingBoxesV3_device_/job:localhost/replica:0/task:0/device:CPU:0}} The batch sizes should be the same [Op:IO>DrawBoundingBoxesV3] name:

batch sizes should be the same refers to the batch size of images and texts, which required in tensorflow/io/tensorflow_io/core/kernels/image_font_kernels.cc:

            context, images.dim_size(0) == texts_tensor.dim_size(0),
            errors::InvalidArgument("The batch sizes should be the same"));

Yet, interestingly, not required for colors....

Okay, then let's try to make the shape batch size fit the image batch size, as the OP requires:

import tensorflow as tf
import tensorflow_io as tfio

width = 560
height = 320
channels = 3

images = tf.random.uniform((height, width, channels), dtype=tf.float32)
images = tf.expand_dims(images, axis=0)

boxes = tf.constant([[[0.1, 0.2, 0.5, 0.9], [0.3, 0.3, 0.6, 0.6]]], dtype=tf.float32)
texts = tf.constant(["hello_world!", "hello_world_part_2"], dtype=tf.string)
colors = tf.constant([[255, 0, 0], [0, 255, 0]], dtype=tf.float32)

# let's also expand for text
texts = tf.expand_dims(texts, axis=0)

print("Shapes of inputs:")
print("Images:", images.shape)
print("Boxes:", boxes.shape)
print("Texts:", texts.shape)
print("Colors:", colors.shape)

output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
print("Output:", output.shape)

But then we hit this error:

Shapes of inputs:
Images: (1, 320, 560, 3)
Boxes: (1, 2, 4)
Texts: (1, 2)
Colors: (2, 3)
InvalidArgumentError                      Traceback (most recent call last)
[<ipython-input-2-f83aad0cb29d>](https://localhost:8080/#) in <cell line: 25>()
     23 print("Colors:", colors.shape)
---> 25 output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
     26 print("Output:", output.shape)

1 frames
<string> in io_draw_bounding_boxes_v3(images, boxes, colors, texts, font_size, name)

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py](https://localhost:8080/#) in raise_from_not_ok_status(e, name)
   5881 def raise_from_not_ok_status(e, name) -> NoReturn:
   5882   e.message += (" name: " + str(name if name is not None else ""))
-> 5883   raise core._status_to_exception(e) from None  # pylint: disable=protected-access

InvalidArgumentError: {{function_node __wrapped__IO>DrawBoundingBoxesV3_device_/job:localhost/replica:0/task:0/device:CPU:0}} The rank of the texts tensor should be 1 [Op:IO>DrawBoundingBoxesV3] name:

The rank of the texts tensor should be 1 is required by another op: tensorflow/io/tensorflow_io/core/kernels/image_font_kernels.cc:

        OP_REQUIRES(context, texts_tensor.dims() == 1,
                        "The rank of the texts tensor should be 1"));

But does it work for colors only, no text? Yes

import tensorflow as tf
import tensorflow_io as tfio

width = 560
height = 320
channels = 3

images = tf.random.uniform((height, width, channels), dtype=tf.float32)
images = tf.expand_dims(images, axis=0)

boxes = tf.constant([[[0.1, 0.2, 0.5, 0.9], [0.3, 0.3, 0.6, 0.6]]], dtype=tf.float32)
colors = tf.constant([[255, 0, 0], [0, 255, 0]], dtype=tf.float32)

print("Shapes of inputs:")
print("Images:", images.shape)
print("Boxes:", boxes.shape)
print("Colors:", colors.shape)

output = tfio.experimental.image.draw_bounding_boxes(images, boxes, None, colors)
print("Output:", output.shape)
Shapes of inputs:
Images: (1, 320, 560, 3)
Boxes: (1, 2, 4)
Colors: (2, 3)
Output: (1, 320, 560, 3)

To sum up, I think this is a limitation right now, because as it does work for colors, so it should work for texts across bounding boxes. I could not spot a limitation that would force only one text display per image.

If you also agree, I would volunteer to help with a fix attempt @yongtang @terrytangyuan

Here is a link to the demo notebook with the above cells: https://colab.research.google.com/drive/1rSder84urmOGF21rtWGb7TDEu-7zq1MP?usp=sharing