matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.57k stars 11.69k forks source link

How to infer large size satellite images #1959

Open lyw615 opened 4 years ago

lyw615 commented 4 years ago

The normal image size is 512 to 3600. But satellite image is usually with shape above *1000010000. The general method spliting it into smaller block is face with the problem how to merge all results to the raw images. Because the edge region of image always get bad result, some researchers crop the large scale image into small block with overlapping. So how to solve the overlapping region's infer result, nms seems not suitable**? Or other methods provided are also appreciated

mohanaditya910 commented 4 years ago

@Iyw615 i am also trying to solve similar problem with satellite images. But as you are referencing about inference, i suppose that you have completed training.

Can you help me out with some strategies for training on satellite images(iSAID datset)?

I have tried training with the given coco weights for maskrcnn by matterport. I am getting NAN values after the warm-up(when training for 'all' layers).

I have warmed up the network with learning rate of 0.0002 for 'heads' and tried for the learning rate specified in the iSAID paper i.e; 0.02. As this was resulting in Nan, i also tried with 0.001,0.005 also. They were also giving Nan.

Any help would be game changer for me. Thanks in advance.

lyw615 commented 4 years ago

If not warm up ,it also come into this situation? Maybe you can check the loss descent process

lyw615 commented 4 years ago

Content found on the Internet: cutting large-scale remote sensing images into multiple small images, such as 1024 1024, or 512 512. Each small image takes a certain overlap with other inferred small images in width and height to reduce the truncation of the object at the cut image.

lyw615 commented 4 years ago

For example,the 1100 1100 image is cut into 512 512 image for inference. The overlap in width and height is assumed to be half of the width and height of the entire small image. The image is cut at (0,0), and the small image is cut on the first line of the large image. The pixel coordinates of the row and column in the big picture in the upper left corner of the figure are: (0,0), (0,256) (0,1100-512). Similarly, the pixel coordinates of the row and column of the upper left corner of the small picture cut out from the first column are the same. They are all pictures with a width and height of 512 * 512, and a total of 9 small pictures can be cut out

ismerd commented 3 years ago

can you share your code how you merged your 9 small pictures again