matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.38k stars 11.65k forks source link

Dimension 0 Shape mismatch when attempting to use gray-scale images #1178

Open SoraDevin opened 5 years ago

SoraDevin commented 5 years ago

I'm trying to change the network to accept grayscale images and have followed the wiki steps to exclude conv1 when loading weights and include it in addition to heads when training. I am running into a shape mismatch when trying to train.

Using TensorFlow backend.

Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     4
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.75
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 4
IMAGE_CHANNEL_COUNT            1
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                14
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    1]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'mrcnn_mask_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [114.8]
MINI_MASK_SHAPE                (56, 56)
NAME                           mammogram
NUM_CLASSES                    2
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                120
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           100
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               30
WEIGHT_DECAY                   0.0001

Loading weights  mask_rcnn_medseg.h5
Traceback (most recent call last):
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1628, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 7 and 64. Shapes are [7,7,1,64] and [64,3,7,7]. for 'Assign' (op: 'Assign') with input shapes: [7,7,1,64], [64,3,7,7].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "samples/medseg/mamo_gray.py", line 348, in <module>
    model.load_weights(weights_path, by_name=True)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/mask_rcnn-2.1-py3.4.egg/mrcnn/model.py", line 2131, in load_weights
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/keras/engine/topology.py", line 3479, in load_weights_from_hdf5_group_by_name
    K.batch_set_value(weight_value_tuples)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/keras/backend/tensorflow_backend.py", line 2372, in batch_set_value
    assign_op = x.assign(assign_placeholder)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/tensorflow/python/ops/variables.py", line 1718, in assign
    name=name)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/tensorflow/python/ops/state_ops.py", line 221, in assign
    validate_shape=validate_shape)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
    use_locking=use_locking, name=name)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1792, in __init__
    control_input_ops)
  File "/home/Student/s4318522/mammogram_mrcnn_2018/mammo-mrcnn-env/lib64/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1631, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 7 and 64. Shapes are [7,7,1,64] and [64,3,7,7]. for 'Assign' (op: 'Assign') with input shapes: [7,7,1,64], [64,3,7,7].

Was there some other modifications I needed to perform to fix this? I've followed all the steps listed on the wiki as follows:

  1. In your subclass of Config, set IMAGE_CHANNEL_COUNT to N.
  2. In the same class, change MEAN_PIXEL from 3 values to N values.
  3. The load_image() method in the Dataset class is designed for RGB. It converts Grayscale images to RGB and removes the 4th channel if present (because typically it's an alpha channel). You'll need to override this method to handle your images.
  4. Since you're changing the shape of the input, the shape of the first Conv layer (Conv1) will change as well. So you can't use the provided pre-trained weights. To get around that, use the exclude parameter when you load the weights to exclude the first layer. This allows you to load weights of all layers except conv1, which will be initialized to random weights.
  5. If you train a subset of layers, remember to include conv1 since it's initialized to random weights. This is relevant if you pass layers="head" or layers="4+", ...etc. when you call train().

My relevant setup code snippets are as follows:

def load_image(self, image_id):
        """Load the specified image and return a [H,W,3] Numpy array.
        Taken from utils.py, any refinements we need can be done here
        """
        # Load image
        image = skimage.io.imread(self.image_info[image_id]['path'])
        # If has an alpha channel, remove it for consistency
        if image.shape[-1] == 4:
            image = image[..., :3]
        # If RGB, convert to grayscale.
        if image.ndim != 1:
            image = skimage.color.rgb2gray(image)
        return image

model.load_weights(weights_path, by_name=True, exclude=[
            "conv1", "mrcnn_class_logits", "mrcnn_bbox_fc",
            "mrcnn_bbox", "mrcnn_mask"])

model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=40,
                layers='heads')

I also modified model.py to include conv1 in "heads":

        # Pre-defined layer regular expressions
        layer_regex = {
            # all layers but the backbone
            "heads": r"(conv1\_.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
            # From a specific Resnet stage and up
            "3+": r"(res3.*)|(bn3.*)|(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
            "4+": r"(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
            "5+": r"(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
            # All layers
            "all": ".*",
        }

My images are 12-bit, so I will try changing them and seeing if that helps, but this shape mismatch thing with the input layer has me stumped. Did anyone else run into this issue when using grayscale images?

yuannver commented 5 years ago

Have you solved this problem yet? I have the same question? @SoraDevin

SoraDevin commented 5 years ago

Unfortunately not. There are a bunch of other augmentations I'm able to work on using 3-channels, and I am just converting the grayscale images to 3-channel when I load them in the function load_image, so I haven't been trying to debug this further. I still have no idea what to change. I will update this if I do find a solution, but hearing from someone else who's already done grayscale images would be nice.

yuannver commented 5 years ago

@SoraDevin I want to ask you how you converted the image into 3 channels in the in the function load_image. I made a mistake in the conversion.

SoraDevin commented 5 years ago

@yuannver, there are a few ways, you can just copy the image to another 2 dimensions. I am just using skimage's gray2rgb function like so:

def load_image(self, image_id):
        """Load the specified image and return a [H,W,3] Numpy array.
        Taken from utils.py, any refinements we need can be done here
        """
        # Load image
        image = skimage.io.imread(self.image_info[image_id]['path'])
        # If grayscale. Convert to RGB for consistency.
        if image.ndim != 3:
            image = skimage.color.gray2rgb(image)
        # If has an alpha channel, remove it for consistency
        if image.shape[-1] == 4:
            image = image[..., :3]
        return image

    def image_reference(self, image_id):
        """Return the path of the image."""
        info = self.image_info[image_id]
        return info["path"]`
deepikakanade commented 5 years ago

1) Change the input dimension by setting channel count to 1 instead of 3.

self.IMAGE_SHAPE = np.array([self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM, self.IMAGE_CHANNEL_COUNT])

2) Change the mean pixel array shape to 1 as there is just one channel now.

MEAN_PIXEL = np.array([123.7]) instead of np.array([123.7, 116.8, 103.9])

3) Also, change the padding array in utils.resize_image() function

padding = [(top_pad, bottom_pad)] instead of

[(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]

4) Change the load_image() function in utils.py file to handle single channel input.

image = skimage.io.imread(self.image_info[image_id]['path'])

    # If grayscale. Convert to RGB for consistency.
    if image.ndim != 1:
        image = skimage.color.rgb2gray(image)

    image = image[..., np.newaxis]   #Extending the size of the image to be (h,w,1)

This worked for me when I am training on gray image dataset. However, I trained the model from scratch so I am not sure how to ignore the first layer while training using pretrained weights.

SoraDevin commented 5 years ago

The padding change is something I didn't do and looks like it might help! The wiki (and my earlier post) also shows how to add the first layer (since it needs to be trained). I also found training all layers anyway improved my performance.

davidb1 commented 5 years ago

@SoraDevin @deepikakanade I'm getting this after making the changes mentioned:

     batch_images[b] = mold_image(image.astype(np.float32), config)
ValueError: could not broadcast input array from shape (1024,720,1) into shape (1024,722,1)

Any ideas how to get past this? I'm guessing its something to do with the padding changes

xkyi commented 5 years ago
  1. Change the input dimension by setting channel count to 1 instead of 3.

self.IMAGE_SHAPE = np.array([self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM, self.IMAGE_CHANNEL_COUNT])

  1. Change the mean pixel array shape to 1 as there is just one channel now.

MEAN_PIXEL = np.array([123.7]) instead of np.array([123.7, 116.8, 103.9])

  1. Also, change the padding array in utils.resize_image() function

padding = [(top_pad, bottom_pad)] instead of

[(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]

  1. Change the load_image() function in utils.py file to handle single channel input.

image = skimage.io.imread(self.image_info[image_id]['path'])

    # If grayscale. Convert to RGB for consistency.
    if image.ndim != 1:
        image = skimage.color.rgb2gray(image)

    image = image[..., np.newaxis]   #Extending the size of the image to be (h,w,1)

This worked for me when I am training on gray image dataset. However, I trained the model from scratch so I am not sure how to ignore the first layer while training using pretrained weights.

model.load_weights(weights_path, by_name=True, exclude=[ "conv1",