matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.52k stars 11.68k forks source link

Increasing Output Mask Resolution #635

Closed patrickcgray closed 3 years ago

patrickcgray commented 6 years ago

My masks continue to come out as relatively blocky and not dealing well with curves and points. For example the attached whale outline. I've turned off mini masks so I'm no longer downsampling those masks in training but does anyone have advice for increasing the output resolution of masks?

There is this line in config.py: MASK_SHAPE = [28, 28] Which I assume is part of the solution but I'm not sure what else I'll need to change to accomodate for a larger mask size because it says "To change this mask size you also need to change the neural network mask branch."

Any tips on what I need to alter to get this to put out a much higher resolution mask?

screen shot 2018-06-03 at 2 34 07 pm
waleedka commented 6 years ago

The mask is generated in build_fpn_mask_graph() in model.py. It's a series of conv and transposed conv layers. If you want higher resolution you'll need to modify that part of the network. For example, add an additional Conv2DTranspose layer to get 56x56, then update the setting in the config to match.

schmidje commented 6 years ago

I wonder: isn't the size of the input training image/mask also related to the quality of the segmented mask?

What is the size of the input images in the config file?

patrickcgray commented 6 years ago

Thanks @waleedka that is really helpful! So I was successful in adding two transposed conv layers to get the mask resolution up to 112x112 but now I'm no longer able to train the whole network (I can only do the heads) or add image augmentation because it leads to memory issues. Specifically I'm getting the error:

2018-06-04 13:39:57.754546: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at assign_op.h:112 : Resource exhausted: OOM when allocating tensor with shape[256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Though the size of the tensor being allocated in the error changes.

Can you suggest what I should change? I'm running two 12Gb GPUs. Batch size is already a single image so that can't be decreased... I don't mind this taking longer to train, it only takes ~3 hours now so it could take much longer if I could just get it to fit in memory.

In order to get from 56x56 to 112x112 I had to change:

RPN_TRAIN_ANCHORS_PER_IMAGE = 256 to RPN_TRAIN_ANCHORS_PER_IMAGE = 128

and

TRAIN_ROIS_PER_IMAGE = 200 to TRAIN_ROIS_PER_IMAGE = 100

But these changes seems to have decreased bounding box accuracy. I'm curious what you suggest to help increase mask resolution even more, I'm hoping to get up to 224x224 at least.

I also was getting the issue:

F ./tensorflow/core/util/cuda_launch_config.h:127] Check failed: work_element_count > 0 (0 vs. 0)

When I tried to run this but downgrading tensorflow from 1.8.0 to 1.7.0 fixed that error.

@schmidje I'm inputting 1024x1024 images with full resolution masks so training masks should be more than enough resolution.

samhodge commented 6 years ago

@patrickcgray did you get the solution you wanted I have a similar concern.

patrickcgray commented 6 years ago

Hi @samhodge yes I got it to work, I actually had to downgrade to tensorflow 1.7 for the error in my last comment.

The masks are higher resolution and seem to require more training data but look pretty good. I'm afraid there is some data bottleneck but it is definitely working better. If you get it working I'm interested in your results!

samhodge commented 6 years ago

I will experiment with a 16 Gb P5000 card first and then consider running the model in the cloud on a 32+ Gb card. I also downgraded to 1.7 as you said, do you have a gist of your modifications, I am not sure if it is enough to just modify the config or if the model itself needs to be adjusted as @waleedka mentioned. Ideally I would like much larger mask resolution, or possibly just pass the mask over to another network like deep lab v3.

samhodge commented 6 years ago

@patrickcgray @waleedka Do you have a branch of your 112x112 source code?

I just want to confirm where the changes need to be made:

I had a look overnight and saw the code: https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py#L959

which uses MASK_POOL_SIZE from https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/config.py#L148

Which are initialised here: https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py#L1422

but also resized here: https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py#L1447

So what i am a little confused about is the relationship between the constants

config.MASK_SHAPE config.MASK_POOL_SIZE config.POOL_SIZE

I want to make sure the masks are as accurate as possible while using the least memory possible.

I dont ask for much do I?

I would like higher fidelity masks, where do I start?

Also once complete I will need to save as a .pb file for inference in tensorflow without Keras.

patrickcgray commented 6 years ago

Hey @samhodge no I don't have a stable branch, but it was only a few lines of code that needed to be added or changed. By the way I ended up making it 56x56 because 112x112 took up too much memory. It also seems that it increased the training requirements substantially but that is only anecdotal. So I just added another Conv2DTranspose layer. I didn't change POOL_SIZE. I only changed MASK_SHAPE and it all worked fine. I was able to get a 112x112 to work by changing RPN_TRAIN_ANCHORS_PER_IMAGE = 256 to RPN_TRAIN_ANCHORS_PER_IMAGE = 128 and TRAIN_ROIS_PER_IMAGE = 200 to TRAIN_ROIS_PER_IMAGE = 100 but I think that decreased accuracy too much and thus reverted back to 56x56. Maybe with more training data that would've been fine but I am very limited on my training samples.

Please keep me updated as to how your network works, I am very interested in using this code to generate higher resolution masks. I'm happy to help more if I can so just ask.

samhodge commented 6 years ago

I am training away as you have suggested at 56x56.

Just basically duplicated the transpose2d. Which takes it from MASK_POOL_SIZE =14 and instead of doubling once 14x2 =28. Now doubles twice ie 14x2x2=56. I guess this can be repeated for as much memory as a card can bear.

But what I am curious about is the dimensions should it be number of classes or 256?

This could save some memory because NUM_CLASSES < 256

Or am I misreading and this refers to the levels in the greyscale mask.

Because the final tensor dimension is numclasses.

This is with mini mask of 112x112 and max resolution of 2048 might need to up the min resolution too.

Fits on a 16 Gb card OK still only on the first 40 epochs stage and is hovering around 2.0-2.4 loss.

I tried serialising the model to a .pb file for tensorflow and it doesn’t plug into my old inference model for tensorflow.

So I am not sure if I made a mistake or if I should just let it complete training.

Funnily the masks are

...x100x56x56x256 but I would have expected ...x100x56x56x81 so I think my config or network, is a bit awry, based on coco.

But you cannot expect perfection on a first attempt.

samhodge commented 6 years ago

Seems like I am out of memory with stage 3 training.

I might start again in look into the issue with the number of classes being 256 rather than 81.

samhodge commented 6 years ago

@patrickcgray @waleedka

Happy 4th of July to the Americans out there:

I ran out of resource using the coco.py sample with the upscaled 56x56 tensor

after the 120th epoch

here is the error

   data_format=data_format),
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1224, in conv2d_backprop_input
    dilations=dilations, name=name)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

...which was originally created as op 'rpn_model/rpn_class_raw/convolution', defined at:
  File "samples/coco/coco.py", line 455, in <module>
    model_dir=args.logs)
  File "/home/samh/dev/mask2/Mask_RCNN/mrcnn/model.py", line 1834, in __init__
    self.keras_model = self.build(mode=mode, config=config)
  File "/home/samh/dev/mask2/Mask_RCNN/mrcnn/model.py", line 1941, in build
    layer_outputs.append(rpn([p]))
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/keras/engine/topology.py", line 602, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/keras/engine/topology.py", line 2058, in call
    output_tensors, _, _ = self.run_internal_graph(inputs, masks)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/keras/engine/topology.py", line 2209, in run_internal_graph
    output_tensors = _to_list(layer.call(computed_tensor, **kwargs))
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/keras/layers/convolutional.py", line 164, in call
    dilation_rate=self.dilation_rate)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 3164, in conv2d
    data_format='NHWC')
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 782, in convolution
    return op(input, filter)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 870, in __call__
    return self.conv_op(inp, filter)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 522, in __call__
    return self.call(inp, filter)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 206, in __call__
    name=self.name)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 953, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,512,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[Node: training_2/SGD/gradients/rpn_model/rpn_class_raw/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, _class=["loc:@rpn_model/rpn_class_raw/convolution"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training_2/SGD/gradients/rpn_model/rpn_class_raw/convolution_grad/ShapeN, rpn_class_raw/kernel/read, training_2/SGD/gradients/rpn_model/rpn_class_raw/convolution_grad/Conv2DBackpropInput-2-TransposeNHWCToNCHW-LayoutOptimizer)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
         [[Node: mul_529/_8677 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_32895_mul_529", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x7f9422edb0b8>>
Traceback (most recent call last):
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 712, in __del__
  File "/home/samh/miniconda2/envs/mask/lib/python3.5/site-packages/tensorflow/python/framework/c_api_util.py", line 31, in __init__
TypeError: 'NoneType' object is not callable

the rest scrolled out of view.

It failed again at the stage 3 training.

All I have done it taken the coco.py config and used it to train apart from editing the Max Resolution to 2048 and added a Conv2dTranspose to double the size of the masks from 28x28 to 56x56

Has anyone else tried this with COCO?

Stage 1,2 were going so well. What size card would I need to fit stage three with these dimensions?

patrickcgray commented 6 years ago

@samhodge I'm not sure about the MASK_POOL_SIZE and NUM_CLASSES hyperparameters. Maybe @waleedka will weight in on those.

Are you trying to train this with the whole COCO dataset? I only did the heads and the top 3 layers in two training stages. And I ran for a total of 100 epochs.

I would suggest trying to downgrade to tensorflow 1.7 if you're not already on that.

samhodge commented 6 years ago

Yes I am training on the full COCO dataset and it fails on the 121st epoch when it “fine tunes the entire network “ with a lower learning rate for the final 40 epochs.

samhodge commented 6 years ago

I am on tensor flow 1.7

YubinXie commented 6 years ago

The mini mask is (56,56), however, the mask_shape is (28,28). I am confused. What is the relationship between them?

patrickcgray commented 6 years ago

@YubinXie check out this notebook for a good explanation and example on the mini masks: https://github.com/matterport/Mask_RCNN/blob/master/samples/coco/inspect_data.ipynb

If you want to have higher resolution output masks I suggest setting use_mini_mask to False.

patrickcgray commented 6 years ago

@samhodge any update? I think you're on the right track and am interested in recreating your work if you have gotten it to function and think the output masks are higher resolution.

YubinXie commented 6 years ago

@patrickcgray Thank you for the link. Yea, I can understand the mini mask part. But if the output mask is set (28,28), which is much smaller than mini_Mask. Then why set mini_Mask will help a higher resolution output?

samhodge commented 6 years ago

Needed to inspect the model that I had trained and ran into some installation issues on my development machine but it slowed me down. Once I can confirm the model can still be used in inference. I think I might try training the last epochs on TPUs in the cloud. Sorry I have been travelling.

samhodge commented 6 years ago

Just to let you know I have been training with minimask at 112x112

samhodge commented 6 years ago

sofarsogood Seems to work for me, now I need to test inference of the model, loss is around 2.3-2.0 total loss 0.48 loss(ish) for the mask, I cant recall where it got up to.

But I would really like to run the low learning rate for the final stage of the model, which I can do on google cloud but I want to be sure the model is heading in the right direction.

samhodge commented 6 years ago

I got it plugged into my C++ Tensorflow inference framework. There were no detections. I think I might circle back and test with a default model and then make sure that works then attempt again with the upscaled model.

samhodge commented 6 years ago

Just came back to this, seems that the OOM killer kicked in at 150Gb when training at stage 4, looking into causes of this, I dont think this is normal.

samhodge commented 6 years ago

This is where I got up to

Using TensorFlow backend. Command: evaluate Model: logs/coco20180825T1544/mask_rcnn_coco_0100.h5 Dataset: /data/coco

Configurations: BACKBONE_SHAPES [[512 512] [256 256] [128 128] [ 64 64] [ 32 32]] BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.7 DETECTION_NMS_THRESHOLD 0.3 GPU_COUNT 1 IMAGES_PER_GPU 1 IMAGE_MAX_DIM 2048 IMAGE_MIN_DIM 800 IMAGE_PADDING True IMAGE_SHAPE [2048 2048 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.002 MASK_POOL_SIZE 14 MASK_SHAPE [56, 56] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (224, 224) NAME coco NUM_CLASSES 81 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (32, 64, 128, 256, 512) RPN_ANCHOR_STRIDE 2 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 1000 TRAIN_ROIS_PER_IMAGE 128 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STPES 50 WEIGHT_DECAY 0.0001

Loading weights logs/coco20180825T1544/mask_rcnn_coco_0100.h5 loading annotations into memory... Done (t=0.57s) creating index... index created! Loading and preparing results... DONE (t=0.00s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=1.14s). Accumulating evaluation results... DONE (t=0.46s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.076 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.151 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.074 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.029 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.109 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.100 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.076 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.096 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.096 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.032 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.123 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.129 Prediction time: 814.669100522995. Average 1.62933820104599/image Total time: 837.5126931667328

This is with v1.0 of the respository for historical reasons, so I havent exactly set the world on fire.

But it did train, now to try again without mini masks.

samhodge commented 6 years ago

Add another 100 epochs and it is not much better

Model: logs/coco20180825T1544/mask_rcnn_coco_0200.h5 Dataset: /data/coco

Configurations: BACKBONE_SHAPES [[512 512] [256 256] [128 128] [ 64 64] [ 32 32]] BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.7 DETECTION_NMS_THRESHOLD 0.3 GPU_COUNT 1 IMAGES_PER_GPU 1 IMAGE_MAX_DIM 2048 IMAGE_MIN_DIM 800 IMAGE_PADDING True IMAGE_SHAPE [2048 2048 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.002 MASK_POOL_SIZE 14 MASK_SHAPE [56, 56] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (224, 224) NAME coco NUM_CLASSES 81 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (32, 64, 128, 256, 512) RPN_ANCHOR_STRIDE 2 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 1000 TRAIN_ROIS_PER_IMAGE 128 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STPES 50 WEIGHT_DECAY 0.0001

Loading weights logs/coco20180825T1544/mask_rcnn_coco_0200.h5 loading annotations into memory... Done (t=0.58s) creating index... index created! Loading and preparing results... DONE (t=0.00s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=1.33s). Accumulating evaluation results... DONE (t=0.50s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.086 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.170 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.077 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.036 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.118 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.118 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.084 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.107 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.107 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.041 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.136 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.146 Prediction time: 822.4635076522827. Average 1.6449270153045654/image Total time: 842.7166872024536

samhodge commented 6 years ago

Slowly getting there

not exactly 37% but nearly half way there.

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.131 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.235 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.134 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.056 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.157 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.192 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.125 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.163 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.163 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.062 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.181 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.232

samhodge commented 6 years ago

Slowly but surely.

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.143 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.267 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.138 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.059 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.175 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.219 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.139 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.177 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.178 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.064 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.203 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.26

samhodge commented 6 years ago

2048x2048 56x56 masks mAP 14.3%

brisbane_hi2 0069

1024x1024 28x28 masks mAP ~20-35% (yet to do eval to be sure) brisbane 0069

samhodge commented 6 years ago

Currently using mini mask resolution of 1024x1024 batch size of 1 on a 16Gb GPU and it fits.

But I am worried to get the accuracy I need I might need multiple 32Gb GPUs with a batch size of 2. If anybody has this hardware at their disposal I would be happy to share my code.

samhodge commented 5 years ago

@chengchu88

I have been unable to test without USE_MINI_MASK=False yet, but I might know more later in the week, I have been making use of IBM's LMS (Large Model Support)

Sam

samhodge commented 5 years ago

The mini mask is (56,56), however, the mask_shape is (28,28). I am confused. What is the relationship between them?

Hello.. did you find out the difference? i have the same question. thanks!

You need to introduce a new convolve to enable a larger mask size you cannot just adjust the config alone you need to introduce a new node.

sam

samhodge commented 5 years ago

For the record on 2048x2048 canvas and 56x56 masks I go the mAP up to 26.1% with using mini masks of dimension 1024.

patrickcgray commented 5 years ago

@samhodge why are you using mini masks as all? Why wouldn't you just make them false and not worry about it?

@chengchu88 Sam is correct, you need to add a transposed convolution. See this comment: https://github.com/matterport/Mask_RCNN/issues/635#issuecomment-394208794

samhodge commented 5 years ago

@patrickcgray

Without that and using the COCO dataset you get an OOM death using IBM contributed large model support it becomes possible

See

https://arxiv.org/abs/1807.02037

https://github.com/tungld/tensorflow/blob/lms-contrib/tensorflow/contrib/lms/README.md

See

patrickcgray commented 5 years ago

@chengchu88 I have had that same issue and it is really hurting my main goal which is to measure the final mask size. I'm very interested in any solution you find.

rzadp commented 5 years ago

@patrickcgray Do you happen to have any more insight on this issue now?

I'm following the Splash of Color article, trying to detect badminton courts. I seem to be having very similar issue with mask inaccuracy.

badminton

I tried with and without the mini-masks.

I'm not sure changing the mask_shape is a good way because:

Could someone point me in the right direction? Is it maybe just the matter of longer training and/or bigger training set?

patrickcgray commented 5 years ago

Hey @rzadp you don't need to retrain on COCO, I never did even though maybe you would get a slight increase from that.

I think you just need more training data. You'll notice that all the examples on the repo README have much better masks than you're showing here and most of them didn't change mask size.

I did end up changing my mask size and it seemed to lead to a small but meaningful increase in mAP but you'll need sufficient training data to the point where it may be better just to use the normal mask size.

rzadp commented 5 years ago

@patrickcgray Could you tell me what is the size of your training set please? Were you able to achieve good mask results with 56x56 mask?

The Splash of Color article has only around 60 training images.

What I found is that it deals well with small balloons: 300_train_2 copy

But not that very well with bigger balloons: 300_train_4 copy We can see the blocky mask as in the original comment.


What I tried to improve this

  1. More training epochs This did not improve the results. The loss function stalls. screenshot 2019-02-22 at 18 32 50

  2. Increase the MASK_SHAPE to 56x56 following @patrickcgray and @samhodge advice This significantly improved the results on big balloons: 100_56x56 copy I couldn't increase the mask to 112x112 because I was running out of memory.


What confuses me

Maybe @waleedka would be so kind as to nudge me in the right direction?

samhodge commented 5 years ago

I have been thinking about this issue and there is an old piece of software that I used to use back in the late 90s for rescaling graphics.

it was embedded into Macromedia Flash, you could turn a raster image to vector before resizing.

The software behind it is available in C.

http://potrace.sourceforge.net/

see

http://potrace.sourceforge.net/samples.html

samhodge commented 5 years ago

maybe this could work https://pythonhosted.org/pypotrace/tutorial.html

kongjibai commented 5 years ago

The mask is generated in build_fpn_mask_graph() in model.py. It's a series of conv and transposed conv layers. If you want higher resolution you'll need to modify that part of the network. For example, add an additional Conv2DTranspose layer to get 56x56, then update the setting in the config to match.

Hi, I want't to know, just add a Conv2DTranspose layer, why the training time increase so much? Almost 6 times. I use single nvidia1080Ti 12GiB, it reminds me "Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.20GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available."

samhodge commented 5 years ago

Think about it if the tensor is 100x81x28x28 think about the size of that tensor compared with 100x81x56x56

mihiri91 commented 5 years ago

I have added additional Conv2dTranspose layer as I increased MASK_SHAPE to (56,56) IMAGES_PER_GPU=1 and max_queue_size=50. But it gave me OSError: [Errno 12] Cannot allocate memory.

Since, I am a beginner described answers will be very helpful. Thank you in advance.

samhodge commented 5 years ago

I have added additional Conv2dTranspose layer as I increased MASK_SHAPE to (56,56) IMAGES_PER_GPU=1 and max_queue_size=50. But it gave me OSError: [Errno 12] Cannot allocate memory.

Since, I am a beginner described answers will be very helpful. Thank you in advance.

What hardware are you running on?

rzadp commented 5 years ago

@mihiri91 I have 8 GB GPU and run into the out of memory problem as well. Decreasing RPN_TRAIN_ANCHORS_PER_IMAGE and TRAIN_ROIS_PER_IMAGE reduced memory usage for me and I was able to run this on (56,56) mask, (112,112) even.

mihiri91 commented 5 years ago

@samhodge I'm using Google Colab GPU.

@rzadp Thank you, I will try and give you a feedback.

Parnia commented 5 years ago

Hi, I added another transposed convolution layer in model.py and also set the MASK_SHAPE=56. However, it stucks at the first epoch without any errors about the memory. I am training on a cluster. Did you change anything else in order to make it work?

rzadp commented 5 years ago

@Parnia I think these 2 changes were enough. Maybe it by mistake runs on CPU instead of GPU and the slow training seems stuck?

autonomousmappinglab commented 5 years ago

Hello, I tried to lower the resolution of the images I want to feed to Mask-RCNN to get a more accurate result. However, I noticed no difference in accuracy between training with 256x256 images vs 1024x1024. This is my config: IMAGES_PER_GPU = 2 NUM_CLASSES = 1 + 1
STEPS_PER_EPOCH = 100 DETECTION_MIN_CONFIDENCE = 0 USE_MINI_MASK = False MASK_SHAPE = [56, 56]
MASK_POOL_SIZE = 28 IMAGE_MIN_DIM = 800 IMAGE_MAX_DIM = 1024

Is my expectation wrong? Or do I need to change other params?

harshgrovr commented 5 years ago

@autonomousmappinglab

Did you change the gt mask also? How did you annotate it? Like it had some polygon sizes, did you normalize that also, according to your new image resolution?