Open dluks opened 2 years ago
I think the model cannot learn the annotations or has a problem with saving weights. I couldn't understand the cropping process of the images. If you have cropped images, have you configurated your annotations according to the sizes of your new images? Could you explain your dataset a little more?
Sure, my dataset consists of 360 512x512x3 RGB tif images + their corresponding annotations (which are also 512x512 tifs with integer labels). For each RGB image I have one corresponding label/mask image. On loading the masks, I isolate each individual label into its own array and then stack them all, resulting in mask arrays that have dimensions (512, 512, # of labels), which is how it was done in the nucleus.py
sample project.
def load_mask(self, image_id):
"""Generate instance masks for an image.
Returns:
masks: A bool array of shape [height, width, instance count] with
one mask per instance.
class_ids: a 1D array of class IDs of the instance masks.
"""
info = self.image_info[image_id]
# Get mask directory from image path
mask_dir = os.path.join(os.path.dirname(os.path.dirname(info["path"])), "mask")
# Read mask file from .tif image and separate classes into
# individual boolean mask layers
mask = tiff.imread(glob.glob(f"{mask_dir}/*.tif")[0]).astype("int")
classes = np.unique(mask)
masks = []
for cl in classes:
if cl > 0:
m = np.zeros((mask.shape[0], mask.shape[1]))
m[np.where(mask == cl)] = 1
masks.append(m)
masks = np.moveaxis(np.array(masks), 0, -1)
# Return mask, and array of class IDs of each instance. Since we have
# one class ID, we return an array of ones
return masks, np.ones([masks.shape[-1]], dtype=np.int32)
There are, on average, 46 labeled trees per 512x512 image (with the maximum number of trees in an image being to 101). In the config, I chose to use the "crop" method to further reduce the size of the images to random crops of 128x128 (this was a size I found appropriate when performing my own U-Net semantic segmentation on the same dataset). But perhaps I'm misunderstanding the usage of the crop function here?
So far, since I have a small dataset, I've also tried just training the classifiers/heads, but I still can't seem to get a loss/val_loss below 2, and the output of the model predictions is still super strange-looking:
image ID: tree.393_5823_RGB_2020_04_08 (13) 393_5823_RGB_2020_04_08
Original image shape: [512 512 3]
Processing 1 images
image shape: (512, 512, 3) min: 13.00000 max: 255.00000 uint8
molded_images shape: (1, 512, 512, 3) min: 13.00000 max: 255.00000 uint8
image_metas shape: (1, 14) min: 0.00000 max: 512.00000 int64
anchors shape: (1, 65280, 4) min: -0.08856 max: 1.02594 float32
gt_class_id shape: (56,) min: 1.00000 max: 1.00000 int32
gt_bbox shape: (56, 4) min: 0.00000 max: 512.00000 int32
gt_mask shape: (512, 512, 56) min: 0.00000 max: 1.00000 float64
AP @0.50: 0.000
AP @0.55: 0.000
AP @0.60: 0.000
AP @0.65: 0.000
AP @0.70: 0.000
AP @0.75: 0.000
AP @0.80: 0.000
AP @0.85: 0.000
AP @0.90: 0.000
AP @0.95: 0.000
AP @0.50-0.95: 0.000
Maybe I just don't have enough training data, but I can't help but feel like there's something obviously flawed about my setup that I'm missing here...
Did you resize your masks while using the crop method? There are many functions related to crop and resize in model and utils.py. In addition to these, I think you should focus on the load_image_gt(...) in the model :
and the resize functions in utils.py for cropping and resizing.
Also, I think there seems to be a problem between mask sizes and image sizes because you use the original size of your images in the detection part. Check the issue here #396.
@dluks This happened to me similarly regardless of the format of the image. It turned out that, like other people reported similar odd behavior in detection in the issues, anything above tensorflow version 2.5 has this weird problem. So I downgraded my tensorflow to version 2.5, keras to 2.4.3, cudatoolkit to 11.2 and cudnn to 8.1 and it worked out fine.
@nyinyinyanlin Could you specify which mask rcnn repository you are using? with tensorflow to version 2.5 and keras to 2.4.3, I have problems with the load weights function. The weights do not load well
@MatesdeSilvia can you please tell me what it means by weights do not load well? Can you please insert error logs or screenshot? I use Lee Kun Hee's fork but please be aware that you will have to manually install the specific versions of libraries instead of using pip install -r requirements
as Lee Kun Hee's fork uses lower versions of Tensorflow and Keras.
Hi all,
I've been very excited to get to apply this slightly intimidating project to some new data, but despite all of the great examples of impressive results I've seen out there, I'm really struggling to get results that are at all promising, and so I'm suspecting that there's something fundamental I'm overlooking in my setup.
My dataset consists of aerial RGB shots of a city, with two classes: tree and background.
Images: Aerial RGB photos, all 512x512, training: 324 validation: 36, using random crops of 128x128. ~46 trees per image on average.
Each training session ends up with something looking pretty similar to this:
With the following rough stats when testing on the validation set with no image cropping using the
inspect_model.ipynb
as a guide:I keep getting the same results (seemingly high confidence with zero or very close to zero IoU, generally clustered at the tops of the images), even after implementing advice I've found elsewhere in this repo (for small datasets) such as only training on heads, initializing with coco weights but not for too long, adjusting my anchor scales to match the general sizes and aspect ratios of the annotations, etc.
So far I'm questioning:
Checking out the losses, what obviously stands out is the high overall loss (epoch_loss) which increases with each training iteration (just heads -> resnet +4 -> all layers):
My config:
So, any initial thoughts on where I'm going wrong?