tensorflow / models

Models and examples built with TensorFlow
77.16k stars 45.75k forks source link

Mask rcnn training - whole bounding box is masked #6135

Open satrya-sabeni opened 5 years ago

satrya-sabeni commented 5 years ago

System information

Problem description

I have successfully created an custom object detection model with this api before. Now I'm trying to train a mask rcnn model but when I look in tensorboard the mask just covers the whole bounding box with no shape whatsoever, like this:

Image problem

After like 1500 steps no shape at all. I am using the 'mask_rcnn_resnet101_astrous_coco.config' and download the corresponding model to train from. I labeled the images using Rectlabel which generates a xml with bounding box and a mask png. Maybe it doesn't fully accept the png format?

Things I've already checked for:

Possible causes:

At this point I really don't know what to do anymore, so any input or suggestion is appreciated.

This is the tutorial I used: https://towardsdatascience.com/building-a-custom-mask-rcnn-model-with-tensorflow-object-detection-952f5b0c7ab4

TheFlashover commented 5 years ago

Facing the same issue. The tutorial Author doesn't provide proper materials and contains several mistakes. Particularly you can make only one mask per image. create_tf_pet_record.py from dataset_tools folder takes only one mask per image with faces_only== True.

For example you have file "image_1.jpeg" with corresponding "image_1.png" containing
2 trimap areas which describe 2 object instance masks on the image, thus create_tf_pet_record.py will get coordinates for those masks xmins = [] ymins = [] xmaxs = [] ymaxs = [] not from every mask on the image but from all masks.

The problem is in create_tf_pet_record.py. IMO: Once the code is made to convert numerous png files with each containing one mask from a single jpeg image it will run properly.

satrya-sabeni commented 5 years ago

@TheFlashover Hi, I was able to fix my own problem but for every one of my pictures there is only just one mask and object every time. I don't know how it works yet with generating multiple masks and classes yet. I think you can have just one png containing multiple masks, and distinct different classes by colour.

Anyway you were right, the author doesn't provide a proper mask tf record generator and just misses out on some important things. I just recoded the whole thing and the problem I had was that it didn't detect my mask because the tf generator was looking a different color value for a mask, so of course it wouldn't detect the mask I drew. At least it works properly for my use case which for now is just 1 object and class.

alt text

I might have to work with multiple classes and objects in the future, hopefully I can help then.

pkulzc commented 5 years ago

If you have fixes to create_tf_pet_record.py, please feel free to send a PR and I'm glad to review it.

TheFlashover commented 5 years ago


If you have fixes to create_tf_pet_record.py, please feel free to send a PR and I'm glad to review it.

I've found something wrong (maybe a bug) with bounding box values. Basically

nonzero_x_indices = np.where(nonbackground_indices_x)
nonzero_y_indices = np.where(nonbackground_indices_y)

returns tuple of 2 ndarrays like

(array([*x_position1*, *x_position2*, ..., *x_position_max*]), 
array([*Red_channel_position*, *Green_channel_position*, *Blue_channel_position*]))


(array([10, 10, 10, 11, 11, 11, ..., 720, 720, 720]), array([0, 1, 2, 0, 1, 2, ..., 0, 1, 2]))
if our mask starts at x pixel value == 10

so the following code

else :
    xmin = float(np.min(nonzero_x_indices))
    xmax = float(np.max(nonzero_x_indices))
    ymin = float(np.min(nonzero_y_indices))
    ymax = float(np.max(nonzero_y_indices))

sometimes returns wrong values because it looks for minimal value in the second array

I believe it works great if the mask is grayscale.

so the following code has to be changed to

else :
    xmin = float(np.min(nonzero_x_indices[0]))
    xmax = float(np.max(nonzero_x_indices[0]))
    ymin = float(np.min(nonzero_y_indices[0]))
    ymax = float(np.max(nonzero_y_indices[0]))

in these lines of create_tf_pet_record.py.

I made a PR

TheFlashover commented 5 years ago

@satrya-sabeni Can you please provide the pycocotools, slim and object detection tar.gz files you packed for training in Google Cloud. And if possible a sample of your mask and tfrecords file. Thanks in advance.

satrya-sabeni commented 5 years ago

@satrya-sabeni Can you please provide the pycocotools, slim and object detection tar.gz files you packed for training in Google Cloud. And if possible a sample of your mask and tfrecords file. Thanks in advance.

Hi, I have provided the files you asked for in this link: https://we.tl/t-f9rAA5sypQ

Are you still running into problems or just wanted to compare?

GhislainAdon commented 5 years ago

please can you send me your working script create_tf_record.py for mask rcnn training at esaticsrit1b@gmail.com

TheFlashover commented 5 years ago

@satrya-sabeni Can you please provide the pycocotools, slim and object detection tar.gz files you packed for training in Google Cloud. And if possible a sample of your mask and tfrecords file. Thanks in advance.

Hi, I have provided the files you asked for in this link: https://we.tl/t-f9rAA5sypQ

Are you still running into problems or just wanted to compare?

Thanks. I wanted to compare the files. I got it running recently, problem was in small <1000 number of steps trained. Anyway thank you for the files provided.

please can you send me your working script create_tf_record.py for mask rcnn training at esaticsrit1b@gmail.com

here it is


I didn't test it (yet) with 2 or more labels. But for 1 label with multiple masks it works good. You only have to have labels stored under your trimaps folder like following:


i.e. you have 15 images for Tesla chargers with 1-4 masks for each, thus your files have to look like "charger_0_0" and "charger_14_3".

And change the mask_np values in lines 168,169,183 to correspond mask color in your *.png files.

_168_                nonbackground_indices_x = np.any(mask_np != 0, axis=0)
_169_                nonbackground_indices_y = np.any(mask_np != 0, axis=1)

_183_                mask_remapped = (mask_np != 0).astype(np.uint8)