Closed eyildiz-ugoe closed 6 years ago
Im not sure where this is happening in your code, it would be nice to see the entire error stack. But usually, it seems like your are feeding mask-instances into dimension 1 instead dimension 2 where it suppose to be. Look into your load_mask function!
Correct approach: Dimension 0 is suppose to be the width of your image, dimension 1 is suppose to be height, dimension 2 is suppose to be a stack of instances
I haven't made any changes in that function apart from renaming, it stays as:
def load_mask(self, image_id):
"""Generate instance masks for an image.
Returns:
masks: A bool array of shape [height, width, instance count] with
one mask per instance.
class_ids: a 1D array of class IDs of the instance masks.
"""
# If not a component_front dataset image, delegate to parent class.
image_info = self.image_info[image_id]
if image_info["source"] != "component":
return super(self.__class__, self).load_mask(image_id)
# Convert polygons to a bitmap mask of shape
# [height, width, instance_count]
info = self.image_info[image_id]
mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
dtype=np.uint8)
for i, p in enumerate(info["polygons"]):
# Get indexes of pixels inside the polygon and set them to 1
rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
mask[rr, cc, i] = 1
# Return mask, and array of class IDs of each instance. Since we have
# one class ID only, we return an array of 1s
I see that this is happening in your model.load_weights(weights_path, by_name=True)
.
Have you made changes to the architecture of the network (like changes in coco.py)? Can I see your inherited config class parameters?
It could be that the weights from coco dont match the new architecture. But im not sure
Here is my inherited config class:
############################################################
# Configurations
############################################################
class ComponentFrontConfig(Config):
"""Configuration for training on the toy dataset.
Derives from the base Config class and overrides some values.
"""
# Give the configuration a recognizable name
NAME = "component"
# We use a GPU with 12GB memory, which can fit two images.
# Adjust down if you use a smaller GPU.
IMAGES_PER_GPU = 2
# Number of classes (including background)
NUM_CLASSES = 1 + 2 # Background + [lid, screw]
# Number of training steps per epoch
STEPS_PER_EPOCH = 100
# Skip detections with < 90% confidence
DETECTION_MIN_CONFIDENCE = 0.9
Hmm what version of tensorflow, cuda and cudnn are you using?
TF: 1.7 CUDA: 9.0 cuDNN: 7.0
@eyildiz-ugoe I presume you used COCO weights to finetune on your own dataset. In that case, I suspect you changed some of the config and now some layer has incompatible dimension with the COCO pre-trained weights. For example, did you change the number of anchors (number of scales and number of aspect ratios) in the config?
Could you try using the ImageNet pre-trained weight and see if the issue persists (by setting weights
flag to imagenet
in your command line)? Or simply use randomly initialized weights (by setting weights
flag to empty string. But you may need to tweak the arg parsing part of your scripts to skip the weight loading part).
I do not have the weights for ImageNet. Here is the command line argument I am using to run the code:
python3 component_front.py train --dataset=../../datasets/component_front/ --weights=../../mask_rcnn_coco.h5
Since I do not have the .h5 file for ImageNet, I cannot use the pretrained weights of it. As for the empty string, I get an error regarding the missing argument.
@eyildiz-ugoe Take a look at the balloon.py example under the sample/balloon
folder. There you can specify weights=imagenet
in command line and the code will download ImageNet weight for you automatically.
Yes if you want to use empty string you will have to tweak the script a bit to skip the weight loading part.
@patrick-12sigma It was able to download the weights of ImageNet and was able to pass that part now. Although it gave other error which I am going to open ticket for.
This is happening with me when I am trying to train on coco weights. However, I have successfully trained on imagenet weights and one other weight model successfully using my custom configuration settings. What could be the reason?
@Suvi-dha imagenet weights only include the weights for the backbone, but coco weights not only include backbone but also the RPN heads and the second stage heads, which requires your configuration to match almost exactly as the original coco configuration the weights are trained on. That means your anchor box numbers should be exactly the same as the coco config.
I understand what you're saying and I did not change any configuration except the number of GPUs, number of classes and image maximum and minimum dimensions. everything else I kept as it is. I have read all the comments above before posting this because despite following all the constraints its still giving me this error.
@Suvi-dha that is strange. Did you run the balloon sample and try loading coco weights?
yes, it is running successfully.
But with coco, here's my error trace
`
Loading weights D:\SUVIDHA\Mask_RCNN-master\Mask_RCNN-master\mask_rcnn_coco.h5
Traceback (most recent call last):
File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1567, in _create_c_op
c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 16 and 324. Shapes are [1024,16] and [1024,324]. for 'Assign_682' (op: 'Assign') with input shapes: [1024,16], [1024,324].
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "samples\bach\bach.py", line 485, in <module>
model.load_weights(model_path, by_name=True)
File "C:\Users\IIIT\Anaconda3\lib\site-packages\mask_rcnn-2.1-py3.6.egg\mrcnn\model.py", line 2100, in load_weights
File "C:\Users\IIIT\Anaconda3\lib\site-packages\keras\engine\topology.py", line 3468, in load_weights_from_hdf5_group_by_name
K.batch_set_value(weight_value_tuples)
File "C:\Users\IIIT\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2368, in batch_set_value
assign_op = x.assign(assign_placeholder)
File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 615, in assign
return state_ops.assign(self._variable, value, use_locking=use_locking)
File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\ops\state_ops.py", line 283, in assign
validate_shape=validate_shape)
File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 63, in assign
use_locking=use_locking, name=name)
File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
op_def=op_def)
File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1734, in __init__
control_input_ops)
File "C:\Users\IIIT\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1570, in _create_c_op
raise ValueError(str(e))
ValueError: Dimension 1 in both shapes must be equal, but are 16 and 324. Shapes are [1024,16] and [1024,324]. for 'Assign_682' (op: 'Assign') with input shapes: [1024,16], [1024,324].`
Answer in issue #363 has solved my problem. For everyone else getting the error with coco weights. Try loading the weights by modifying the following line
model.load_weights(weights_path, by_name=True, exclude=[ "mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])
@Suvi-dha I have done training using that only. Still while trying to run demo.py for visulaization using my weights it's giving me error. I modified ballon.py to train for three classes. Any suggesion?
ValueError: Dimension 1 in both shapes must be equal, but are 16 and 12 for 'Assign_682' (op: 'Assign') with input shapes: [1024,16], [1024,12].
@PratibhaT try using the balloon config as is and see if the issue persists. I suspect you changed some config that needs you to exclude more layers in order to make it work.
@PratibhaT yes, usually the issue happens if there is a change in number of classes. Run the demo with the same config you had your model trained with.
@Suvi-dha thank you. It is working now, but it is taking like 4sec to process 1 image on TitanX gpu. How much frame rate you are getting?
@PratibhaT Could you share a snippet of the Config class you used please? Im running on TF 1.10 and I get this issue even if I only change the number of classes
EDIT: I just realized that if I specify the weights path instead of passing 'coco' as argument, the model was loaded including the heads. The logic is hard coded to exclude the "heads" only if we pass 'coco' as argument, be careful with this portion of the code
It runs with the balloons all good but when I want to train my own dataset, I get this error no matter what.
I have 2 classes apart from the background.
What might be the problem?
Edit: Full error message.