rola93 commented 3 years ago

I've been reading the retinaNet tutorial and running it, it works like a charm. It was introduced in #109 by @srihari-humbarwadi, awesome work! .

However, I can not underestand, and didn't find any reference on how to prepare a custom dataset for this purpose.

I want to train it on my own model with a custom dataset, clean-dirty-containers-in-montevideo, but I can't find how to convert it from images with xml annotations to the expected format in this tutorial.

I find this repo with some scripts to convert xml annotations to tf records, but not sure if the output is in the required format. I tried it but it didn't work.

I think there should be at least a link to an article or another tutorial explaining how to do it, in the docs.

Any help (sample code, or tutorial on this step) is appreciated.

barbaragabriella commented 3 years ago

Can we get a follow-up here? I'm also interested to know how we can load our own dataset in this example. Thanks

srihari-humbarwadi commented 3 years ago

To make sure that the example doesn't have a lot of code around processing and preparing the data, we uses tensorflow-datasets to load mscoco dataset. If you wish to train it on your own dataset, you need to convert your data into a format that this function expects https://github.com/keras-team/keras-io/blob/9dbe6aef1082e6ef320021ef710c016b712a1379/examples/vision/retinanet.py#L374 along with this you may also need to make necessary changes to the tf.data pipeline.

rola93 commented 3 years ago

you need to convert your data into a format that this function expects

I think this is exactly what needs to be covered. Maybe this tutorial is not the best place, to avoid a lot of code as you say, but may be a complementary article could be written, and include a pointer to it.

In my case, I couldn't pre process my data :/

barbaragabriella commented 3 years ago

Exactly! If there was something that could help process the data and have it close to what the function expects. I have given some tries but no success :(

srihari-humbarwadi commented 3 years ago

It follows the format returned by the tensorflow_datasets builders, You can try matching it by doing something like ---

{
    "image": <image tensor>,
    "objects": {
        "bbox": [
                [y1, x1, y2, x2],
                [y1, x1, y2, x2],
                [y1, x1, y2, x2],
                [y1, x1, y2, x2],
                [y1, x1, y2, x2],
        ],
        "label": [
                class_id_for_object_1,
                class_id_for_object_2,
                class_id_for_object_3,
                class_id_for_object_4,
                class_id_for_object_5
        ]
    }
}

lolikgiovi commented 3 years ago

I tried to make a dict like this:

import pandas as pd

file_location = '/content/drive/MyDrive/Skripsi/Labels/label_train.csv'
column_names = ["filename", "width", "height", "class_label", "xmin", "ymin", "xmax", "ymax"]
df = pd.read_csv(file_location, names=column_names)
df = df.drop([0])

filenames = df.filename.to_list()
classes = df.class_label.to_list()
xmin = df.xmin.to_list()
ymin = df.ymin.to_list()
xmax = df.xmax.to_list()
ymax = df.ymax.to_list()

image = []
bbox = []
id = []

# for i in range(len(filenames)):
for i in range(5):
  ## Encode Image
  # image.append(open(os.path.join(data, filenames[i]), 'rb').read())
  image.append(filenames[i])

  ##Bbox append
  for j in range(4):
    tmp = []
    tmp.append(float(xmin[i]))
    tmp.append(float(ymin[i]))
    tmp.append(float(xmax[i]))
    tmp.append(float(ymax[i]))
  bbox.append(tmp)
  print(bbox[i])

  ## class prep
  if classes[i] == "Debris":
    id.append(2)
  else:
    id.append(1)

objects = {"bbox":bbox, "id":id}
train_dataset = {"image":image, "objects": objects}

so the returned dict is:

{
    'image': ['00708.jpg', '01289.jpg', '01441.jpg', '00327.jpg', '01460.jpg'],
    'objects': 
          {
              'bbox': [[277.0, 427.0, 416.0, 480.0],
                      [266.0, 1.0, 347.0, 61.0],
                      [385.0, 249.0, 451.0, 320.0],
                      [89.0, 431.0, 144.0, 462.0],
                      [433.0, 274.0, 457.0, 341.0]],
              'id': [2, 2, 1, 1, 1]
           }
 }

is it the right approach?

rroosshhaann commented 3 years ago

I have an approach that works. Its a bit of a hack but it works just fine.

Create three basic lists, one for the filename, one for bounding box, and one for the encoded label.
Cast each of these as tf.constant
Create your dataset like so: train_dataset = tf.data.Dataset.from_tensor_slices({"filename":filename, "objects": {"label":label, "bbox":bbox]}}

Read the image from disk in the preprocess_data function, where it is processed as part of the pipeline. something lie so:

def preprocess_data(sample):  
image_string = tf.io.read_file(sample["filename"])
image = tf.image.decode_jpeg(image_string)

bbox = swap_xy(sample["objects"]["bbox"])
class_id = tf.cast(sample["objects"]["label"], dtype=tf.int32)

I don't want to flip horizontally. 
image, bbox = random_flip_horizontal(image, bbox)
image, image_shape, _ = resize_and_pad_image(image)

bbox = tf.stack(
    [
        bbox[:, 0] * image_shape[1],
        bbox[:, 1] * image_shape[0],
        bbox[:, 2] * image_shape[1],
        bbox[:, 3] * image_shape[0],
    ],
    axis=-1,
)
bbox = convert_to_xywh(bbox)
return image, bbox, class_id

lolikgiovi commented 3 years ago

I have an approach that works. Its a bit of a hack but it works just fine.

Create three basic lists, one for the filename, one for bounding box, and one for the encoded label.

Cast each of these as tf.constant

Create your dataset like so: train_dataset = tf.data.Dataset.from_tensor_slices({"filename":filename, "objects": {"label":label, "bbox":bbox]}}

Read the image from disk in the preprocess_data function, where it is processed as part of the pipeline. something lie so:
def preprocess_data(sample):  
    image_string = tf.io.read_file(sample["filename"])
    image = tf.image.decode_jpeg(image_string)

    bbox = swap_xy(sample["objects"]["bbox"])
    class_id = tf.cast(sample["objects"]["label"], dtype=tf.int32)

    I don't want to flip horizontally. 
    image, bbox = random_flip_horizontal(image, bbox)
    image, image_shape, _ = resize_and_pad_image(image)

    bbox = tf.stack(
        [
            bbox[:, 0] * image_shape[1],
            bbox[:, 1] * image_shape[0],
            bbox[:, 2] * image_shape[1],
            bbox[:, 3] * image_shape[0],
        ],
        axis=-1,
    )
    bbox = convert_to_xywh(bbox)
    return image, bbox, class_id

I tried to apply the changes here but then got this error:

ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node while/strided_slice_13}} = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=3, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=0](while/concat_7, while/strided_slice_13/stack, while/strided_slice_13/stack_1, while/strided_slice_13/stack_2)' with input shapes: [4], [2], [2], [2] and with computed input tensors: input[3] = <1 1>.

Seems like my list of bounding boxes are not suitable since it is a list of 1-D vector and can't be indexed using the syntax for matrices, e.g. using [:, 0] as in some functions here.

@rroosshhaann Do you mind sharing your code of 1-3rd steps? Thank you

AndreAbade commented 3 years ago

@lolikgiovi

Did you manage to solve the problem? Do you already have a version of this code for custom-datasets?

X-F-Lpro commented 3 years ago

I solved the custom data input problem!

if you follow the following link you find how to create a Tensorflow Dataset https://www.tensorflow.org/datasets/add_dataset Create this dataset first with your data. You need to have a csv file with all annotations for this, which you can simply map from your xml files.

If you created this dataset then you now should have a mydataset.py file. There you need to adapt your FeaturesDict as follows { "image": tfds.features.Image(shape=(None, None, 3)), "objects": tfds.features.Sequence( { "bbox": tfds.features.BBoxFeature(), "label": tfds.features.ClassLabel(num_classes=1), }),}

Since i didn't need all features, here I only included the basic ones needed for successful training. Next the yield part needs to be adapted. I did it like this: { "image": images_path / f"{image_id}.jpg", "objects": [ { "bbox": tfds.features.BBox( int(row["ymin"]) / int(row["height"]), int(row["xmin"]) / int(row["width"]), int(row["ymax"]) / int(row["height"]), int(row["xmax"]) / int(row["width"]), ), "label": row['label'], }, ], }

if you then build that with in the command line using $tfds build you should get the correctly built dataset as a folder in your tensorflow datasets directory. Now you can simply change the dataset name in the load function and everything should work just fine.

I hope this helps. It was quite difficult to achieve...

05vald0 commented 3 years ago

I have an approach that works. Its a bit of a hack but it works just fine.

Create three basic lists, one for the filename, one for bounding box, and one for the encoded label.

Cast each of these as tf.constant

Create your dataset like so: train_dataset = tf.data.Dataset.from_tensor_slices({"filename":filename, "objects": {"label":label, "bbox":bbox]}}

Read the image from disk in the preprocess_data function, where it is processed as part of the pipeline. something lie so:
def preprocess_data(sample):  
    image_string = tf.io.read_file(sample["filename"])
    image = tf.image.decode_jpeg(image_string)

    bbox = swap_xy(sample["objects"]["bbox"])
    class_id = tf.cast(sample["objects"]["label"], dtype=tf.int32)

    I don't want to flip horizontally. 
    image, bbox = random_flip_horizontal(image, bbox)
    image, image_shape, _ = resize_and_pad_image(image)

    bbox = tf.stack(
        [
            bbox[:, 0] * image_shape[1],
            bbox[:, 1] * image_shape[0],
            bbox[:, 2] * image_shape[1],
            bbox[:, 3] * image_shape[0],
        ],
        axis=-1,
    )
    bbox = convert_to_xywh(bbox)
    return image, bbox, class_id

@rroosshhaann This works when there's only one bbox per image, right? I get the error message " Can't convert non-rectangular Python sequence to Tensor." when using this approach when having multiple bboxes per image.

rpsantosa commented 3 years ago

> It follows the format returned by the tensorflow_datasets builders, You can try matching it by doing something like ---

{
  "image": <image tensor>,
  "objects": {
      "bbox": [
              [y1, x1, y2, x2],
              [y1, x1, y2, x2],
              [y1, x1, y2, x2],
              [y1, x1, y2, x2],
              [y1, x1, y2, x2],
      ],
      "label": [
              class_id_for_object_1,
              class_id_for_object_2,
              class_id_for_object_3,
              class_id_for_object_4,
              class_id_for_object_5
      ]
  }
}

I did. After the preprocess_data() , batch(2) and padded_batch functions, my data is like :

(<tf.Tensor: shape=(2, 896, 896, 3), dtype=float32, numpy= array([[[[0., 0., 0.], [0., 0., 0.], [0., 0., 0.], ..., [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]]], dtype=float32)>, <tf.Tensor: shape=(2, 4, 4), dtype=float32, numpy= array([[[216.70493 , 518.21747 , 91.102036, 139.47879 ], [450.6239 , 193.12451 , 128.74966 , 128.74968 ], [450.6239 , 193.12451 , 128.74966 , 128.74968 ], [450.6239 , 193.12451 , 128.74966 , 128.74968 ]],
    [[576.2461  , 246.96262 , 164.64172 , 164.64174 ],
     [576.2461  , 246.96262 , 164.64172 , 164.64174 ],
     [620.71826 , 486.96158 , 224.25342 , 194.15302 ],
     [399.30353 , 495.5043  , 116.384705, 102.51276 ]]], dtype=float32)>,
<tf.Tensor: shape=(2, 4), dtype=int32, numpy= array([[2, 2, 2, 2], [1, 1, 1, 1]])>)

But it wont work on this step :

 train_dataset= train_dataset.map(
     label_encoder.encode_batch, num_parallel_calls=autotune
 )

from

autotune = tf.data.experimental.AUTOTUNE
train_dataset = train_dataset.map(preprocess_data, num_parallel_calls=autotune)
train_dataset = train_dataset.shuffle( batch_size)
train_dataset = train_dataset.padded_batch(
   batch_size=batch_size, padding_values=(0.0, 1e-8, -1), drop_remainder=True
 )
 train_dataset = train_dataset.map(
   label_encoder.encode_batch, num_parallel_calls=autotune
)

even changing num_classes for something like 10 or less.

The error:

<ipython-input-26-92d21a062ba6>:250 encode_batch  *
        label = self._encode_sample(images_shape, gt_boxes[i], cls_ids[i])
    <ipython-input-26-92d21a062ba6>:233 _encode_sample  *
        box_target = self._compute_box_target(anchor_boxes, matched_gt_boxes)
    <ipython-input-26-92d21a062ba6>:215 _compute_box_target  *
        box_target = tf.concat(
    D:\PYTHON\SII\env\lib\site-packages\tensorflow\python\framework\ops.py:870 __array__  **
        " a NumPy call, which is not supported".format(self.name))

NotImplementedError: Cannot convert a symbolic Tensor (while/truediv_16:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

The intersting part is, the label_encoder.encode_batch works just as expected on the data. But when u put it on training_data.map( label_encoder.encode_batch) it wont run.

Thankyou

rpsantosa commented 3 years ago

It just worked. TF 2.5.0

nikeshdevkota commented 2 years ago

@rpsantosa @X-F-Lpro could you guys provide a little bit more detail about implementing the RetinaNet on custom dataset

X-F-Lpro commented 2 years ago

I could try to give you some hints on how to proceed if you can provide me with what you are trying to accomplish. As I worked on a software I may not publish I can only give code examples. Is there a certain aspect, that is not working for you or which you do not understand?

nikeshdevkota commented 2 years ago

Github I have a csv file which has an annotation format as follows: "path to image","xmin","ymin","xmax","ymax","class ID","class name" and I am trying to load the data in the same format as the reference Coco dataset. I tried to do the same as 05vald0 but I got the same error as @lolikgiovi . I saw your solution in the above comment but I couldn't understand the process of creating dataset.

nikeshdevkota commented 2 years ago

@X-F-Lpro I am detecting object detection for small target, so I will change the anchor box size accordingly as well.

yassiney commented 1 year ago

It follows the format returned by the tensorflow_datasets builders, You can try matching it by doing something like ---
{
  "image": <image tensor>,
  "objects": {
      "bbox": [
              [y1, x1, y2, x2],
              [y1, x1, y2, x2],
              [y1, x1, y2, x2],
              [y1, x1, y2, x2],
              [y1, x1, y2, x2],
      ],
      "label": [
              class_id_for_object_1,
              class_id_for_object_2,
              class_id_for_object_3,
              class_id_for_object_4,
              class_id_for_object_5
      ]
  }
} 
Am confused. The box annotation [y1, x1, y2, x2] should be in a coco default format or normalized because I see that the annotations used in this tuto obtained by tfds.load() are normalized

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.

keras-team / keras-io

Retinanet tutorial - can not run on custom data #407

> It follows the format returned by the tensorflow_datasets builders, You can try matching it by doing something like ---