matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.52k stars 11.68k forks source link

ValueError: Dimension 1 in both shapes must be equal, but are 12 and 324. Shapes are [1024,12] and [1024,324]. for 'Assign_682' (op: 'Assign') with input shapes: [1024,12], [1024,324]. #604

Closed eyildiz-ugoe closed 6 years ago

eyildiz-ugoe commented 6 years ago

It runs with the balloons all good but when I want to train my own dataset, I get this error no matter what.

I have 2 classes apart from the background.

What might be the problem?

Edit: Full error message.


Loading weights  ../../mask_rcnn_coco.h5
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 686, in _call_cpp_shape_fn_impl
    input_tensors_as_shapes, status)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 12 and 324. Shapes are [1024,12] and [1024,324]. for 'Assign_682' (op: 'Assign') with input shapes: [1024,12], [1024,324].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "component_front.py", line 327, in <module>
    model.load_weights(weights_path, by_name=True)
  File "/home/user/workspace/Mask_RCNN/mrcnn/model.py", line 2100, in load_weights
    topology.load_weights_from_hdf5_group_by_name(f, layers)
  File "/home/user/.local/lib/python3.5/site-packages/keras/engine/topology.py", line 3468, in load_weights_from_hdf5_group_by_name
    K.batch_set_value(weight_value_tuples)
  File "/home/user/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2368, in batch_set_value
    assign_op = x.assign(assign_placeholder)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 609, in assign
    return state_ops.assign(self._variable, value, use_locking=use_locking)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 281, in assign
    validate_shape=validate_shape)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
    use_locking=use_locking, name=name)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3292, in create_op
    compute_device=compute_device)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3332, in _create_op_helper
    set_shapes_for_outputs(op)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2496, in set_shapes_for_outputs
    return _set_shapes_for_outputs(op)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2469, in _set_shapes_for_outputs
    shapes = shape_func(op)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2399, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 627, in call_cpp_shape_fn
    require_shape_fn)
  File "/home/user/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 691, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimension 1 in both shapes must be equal, but are 12 and 324. Shapes are [1024,12] and [1024,324]. for 'Assign_682' (op: 'Assign') with input shapes: [1024,12], [1024,324].
user@user:~/workspace/Mask_RCNN/samples/component_front$ 
zungam commented 6 years ago

Im not sure where this is happening in your code, it would be nice to see the entire error stack. But usually, it seems like your are feeding mask-instances into dimension 1 instead dimension 2 where it suppose to be. Look into your load_mask function!

Correct approach: Dimension 0 is suppose to be the width of your image, dimension 1 is suppose to be height, dimension 2 is suppose to be a stack of instances

eyildiz-ugoe commented 6 years ago

I haven't made any changes in that function apart from renaming, it stays as:

 def load_mask(self, image_id):
        """Generate instance masks for an image.
       Returns:
        masks: A bool array of shape [height, width, instance count] with
            one mask per instance.
        class_ids: a 1D array of class IDs of the instance masks.
        """
        # If not a component_front dataset image, delegate to parent class.
        image_info = self.image_info[image_id]
        if image_info["source"] != "component":
            return super(self.__class__, self).load_mask(image_id)

        # Convert polygons to a bitmap mask of shape
        # [height, width, instance_count]
        info = self.image_info[image_id]
        mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
                        dtype=np.uint8)
        for i, p in enumerate(info["polygons"]):
            # Get indexes of pixels inside the polygon and set them to 1
            rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
            mask[rr, cc, i] = 1

        # Return mask, and array of class IDs of each instance. Since we have
        # one class ID only, we return an array of 1s
zungam commented 6 years ago

I see that this is happening in your model.load_weights(weights_path, by_name=True). Have you made changes to the architecture of the network (like changes in coco.py)? Can I see your inherited config class parameters? It could be that the weights from coco dont match the new architecture. But im not sure

eyildiz-ugoe commented 6 years ago

Here is my inherited config class:

############################################################
#  Configurations
############################################################

class ComponentFrontConfig(Config):
    """Configuration for training on the toy  dataset.
    Derives from the base Config class and overrides some values.
    """
    # Give the configuration a recognizable name
    NAME = "component"

    # We use a GPU with 12GB memory, which can fit two images.
    # Adjust down if you use a smaller GPU.
    IMAGES_PER_GPU = 2

    # Number of classes (including background)
    NUM_CLASSES = 1 + 2  # Background + [lid, screw]

    # Number of training steps per epoch
    STEPS_PER_EPOCH = 100

    # Skip detections with < 90% confidence
    DETECTION_MIN_CONFIDENCE = 0.9
zungam commented 6 years ago

Hmm what version of tensorflow, cuda and cudnn are you using?

eyildiz-ugoe commented 6 years ago

TF: 1.7 CUDA: 9.0 cuDNN: 7.0

patrick-llgc commented 6 years ago

@eyildiz-ugoe I presume you used COCO weights to finetune on your own dataset. In that case, I suspect you changed some of the config and now some layer has incompatible dimension with the COCO pre-trained weights. For example, did you change the number of anchors (number of scales and number of aspect ratios) in the config?

Could you try using the ImageNet pre-trained weight and see if the issue persists (by setting weights flag to imagenet in your command line)? Or simply use randomly initialized weights (by setting weights flag to empty string. But you may need to tweak the arg parsing part of your scripts to skip the weight loading part).

eyildiz-ugoe commented 6 years ago

I do not have the weights for ImageNet. Here is the command line argument I am using to run the code:

python3 component_front.py train --dataset=../../datasets/component_front/ --weights=../../mask_rcnn_coco.h5

Since I do not have the .h5 file for ImageNet, I cannot use the pretrained weights of it. As for the empty string, I get an error regarding the missing argument.

patrick-llgc commented 6 years ago

@eyildiz-ugoe Take a look at the balloon.py example under the sample/balloon folder. There you can specify weights=imagenet in command line and the code will download ImageNet weight for you automatically.

Yes if you want to use empty string you will have to tweak the script a bit to skip the weight loading part.

eyildiz-ugoe commented 6 years ago

@patrick-12sigma It was able to download the weights of ImageNet and was able to pass that part now. Although it gave other error which I am going to open ticket for.

Suvi-dha commented 6 years ago

This is happening with me when I am trying to train on coco weights. However, I have successfully trained on imagenet weights and one other weight model successfully using my custom configuration settings. What could be the reason?

patrick-llgc commented 6 years ago

@Suvi-dha imagenet weights only include the weights for the backbone, but coco weights not only include backbone but also the RPN heads and the second stage heads, which requires your configuration to match almost exactly as the original coco configuration the weights are trained on. That means your anchor box numbers should be exactly the same as the coco config.

Suvi-dha commented 6 years ago

I understand what you're saying and I did not change any configuration except the number of GPUs, number of classes and image maximum and minimum dimensions. everything else I kept as it is. I have read all the comments above before posting this because despite following all the constraints its still giving me this error.

patrick-llgc commented 6 years ago

@Suvi-dha that is strange. Did you run the balloon sample and try loading coco weights?

Suvi-dha commented 6 years ago

yes, it is running successfully.

But with coco, here's my error trace

`

Loading weights  D:\SUVIDHA\Mask_RCNN-master\Mask_RCNN-master\mask_rcnn_coco.h5
Traceback (most recent call last):
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1567, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 16 and 324. Shapes are [1024,16] and [1024,324]. for 'Assign_682' (op: 'Assign') with input shapes: [1024,16], [1024,324].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "samples\bach\bach.py", line 485, in <module>
    model.load_weights(model_path, by_name=True)
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\model.py", line 2100, in load_weights
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\keras\engine\topology.py", line 3468, in load_weights_from_hdf5_group_by_name
    K.batch_set_value(weight_value_tuples)
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2368, in batch_set_value
    assign_op = x.assign(assign_placeholder)
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 615, in assign
    return state_ops.assign(self._variable, value, use_locking=use_locking)
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\ops\state_ops.py", line 283, in assign
    validate_shape=validate_shape)
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 63, in assign
    use_locking=use_locking, name=name)
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
    op_def=op_def)
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1734, in __init__
    control_input_ops)
  File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1570, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension 1 in both shapes must be equal, but are 16 and 324. Shapes are [1024,16] and [1024,324]. for 'Assign_682' (op: 'Assign') with input shapes: [1024,16], [1024,324].`
Suvi-dha commented 6 years ago

Answer in issue #363 has solved my problem. For everyone else getting the error with coco weights. Try loading the weights by modifying the following line

model.load_weights(weights_path, by_name=True, exclude=[ "mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])

PratibhaT commented 6 years ago

@Suvi-dha I have done training using that only. Still while trying to run demo.py for visulaization using my weights it's giving me error. I modified ballon.py to train for three classes. Any suggesion?

ValueError: Dimension 1 in both shapes must be equal, but are 16 and 12 for 'Assign_682' (op: 'Assign') with input shapes: [1024,16], [1024,12].

patrick-llgc commented 6 years ago

@PratibhaT try using the balloon config as is and see if the issue persists. I suspect you changed some config that needs you to exclude more layers in order to make it work.

Suvi-dha commented 6 years ago

@PratibhaT yes, usually the issue happens if there is a change in number of classes. Run the demo with the same config you had your model trained with.

PratibhaT commented 6 years ago

@Suvi-dha thank you. It is working now, but it is taking like 4sec to process 1 image on TitanX gpu. How much frame rate you are getting?

slothkong commented 6 years ago

@PratibhaT Could you share a snippet of the Config class you used please? Im running on TF 1.10 and I get this issue even if I only change the number of classes

EDIT: I just realized that if I specify the weights path instead of passing 'coco' as argument, the model was loaded including the heads. The logic is hard coded to exclude the "heads" only if we pass 'coco' as argument, be careful with this portion of the code

https://github.com/matterport/Mask_RCNN/blob/41e7c596ebb83b05a4154bb0ac7a28e0b9afd017/samples/balloon/balloon.py#L348-L355