Sneriko commented 1 year ago

I get this error message: '"binary marks" must have the same shape as the image', when I run inference with the trained model. It's an RTMDet model that I trained with mmdetection 3.x. What could be the possible causes for the model to predict a mask with a different dimension than the image?

This is my training config:

base = '/home/erik/Riksarkivet/Projects/HTR_Pipeline/mmdetection/configs/rtmdet/rtmdet-ins_m_8xb32-300e_coco.py'

load_from = '/home/erik/Riksarkivet/Projects/HTR_Pipeline/models/checkpoints/rtmdet-ins_m_8xb32-300e_coco_20221123_001039-6eba602e.pth'

model.backbone.frozen_stages=4

data_root = ''

work_dir = '/home/erik/Riksarkivet/Projects/HTR_Pipeline/models/checkpoints/rtmdet_regions_6'

base_lr = 0.004/16

train_batch_size_per_gpu = 2 val_batch_size_per_gpu = 1 train_num_workers = 1 num_classes = 1

metainfo = { 'classes': ('TextRegion'), 'palette': [ (220, 20, 60) ] }

model = dict(bbox_head=dict(num_classes=1))

icdar_2019 = dict( type='CocoDataset', metainfo=metainfo, data_prefix=dict(img='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/ICDAR-2019/clean/'), ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/ICDAR-2019/clean/gt_files/coco_regions2.json', pipeline=base.train_pipeline )

icdar_2019_test = dict( type='CocoDataset', metainfo=metainfo, data_prefix=dict(img='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/ICDAR-2019/clean/'), ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/ICDAR-2019/clean/gt_files/coco_regions2.json', test_mode=True, pipeline=base.test_pipeline )

police_records = dict( type='CocoDataset', metainfo=metainfo, data_prefix=dict(img='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/'), ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/gt_files/coco_regions2.json', pipeline=base.train_pipeline )

train_list = [police_records, icdar_2019]

test_list = [icdar_2019_test]

train_dataloader = dict( batch_size=train_batch_size_per_gpu, num_workers=train_num_workers, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), dataset=dict( type='ConcatDataset', datasets=train_list, ))

val_dataloader = dict( batch_size=1, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type='CocoDataset', metainfo=metainfo, data_prefix=dict(img='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/'), ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/gt_files/coco_regions2.json', pipeline=base.test_pipeline, test_mode=True ))

test_dataloader = val_dataloader

val_evaluator = dict( type='CocoMetric', metric=['bbox', 'segm'], ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/gt_files/coco_regions2.json' ) test_evaluator = val_evaluator

model = dict(test_cfg=dict( nms_pre=200, min_bbox_size=0, score_thr=0.4, nms=dict(type='nms', iou_threshold=0.6), max_per_img=50, mask_thr_binary=0.5))

    # loss_cls is dynamically adjusted based on num_classes, but when num_classes = 1, loss_cls is always 0

default_hooks = dict(

set how many epochs to save the model, and the maximum number of models to save,`save_best` is also the best model (recommended).

checkpoint=dict(
    type='CheckpointHook',
    interval=1,
    max_keep_ckpts=5,
    save_best='auto'),
# logger output interval
logger=dict(type='LoggerHook', interval=100))

max_epochs = 12 stage2_num_epochs = 2

base_lr = 0.004/16

interval = 12

train_cfg = dict( max_epochs=12, val_interval=12, dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)])

test_cfg = dict(pipeline=base.test_pipeline)

pipeline=base.test_pipeline

optim_wrapper = dict( delete=True, type='OptimWrapper', optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True))

param_scheduler = [ dict( type='LinearLR', start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict(

use cosine lr from 150 to 300 epoch

    type='CosineAnnealingLR',
    eta_min=base_lr * 0.05,
    begin=max_epochs // 2,
    end=max_epochs,
    T_max=max_epochs // 2,
    by_epoch=True,
    convert_to_iter_based=True),

]

vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer', save_dir='/home/erik/Riksarkivet/Projects/HTR_Pipeline/output')

greatful for help!

Sneriko commented 1 year ago

I can also add that the mask predicted is 0-4 pixels too small in the y-dimension

ThePassedWind commented 1 year ago

marked

W-hary commented 1 year ago

I encountered the same error, how to solve it?

ahanjaya commented 8 months ago

Same problem here

EmmaMeeus commented 8 months ago

same issue, please consider fixing this

2649 commented 7 months ago

Still an issue

miquel-espinosa commented 7 months ago

still a problem. The issue is that the binary_masks are rescaled according to the configs defined in test_pipeline (e.g. dict(type='FixShapeResize...')). But the images are being loaded from the path and not resized. Thus, it gives a problem with the different sizes.

Bug is in line 136 file mmdet>engine>hooks>visualization_hook.py Images should be resized according to the test_pipeline.

DAIJinJ commented 6 months ago

How to solve it?

DAIJinJ commented 6 months ago

still a problem. The issue is that the binary_masks are rescaled according to the configs defined in test_pipeline (e.g. dict(type='FixShapeResize...')). But the images are being loaded from the path and not resized. Thus, it gives a problem with the different sizes.

Bug is in line 136 file mmdet>engine>hooks>visualization_hook.py Images should be resized according to the test_pipeline.

Have you solved it?

werk104 commented 6 months ago

I encountered this error as well when using the RTMDet instance segmentation model for image inference. I noticed that the length or width of the image that caused the error was larger than the scale parameter (640640) of the Resize in the test_pipeline. Therefore, I directly changed the scale to be larger than the length and width of the image that needed inference (960960). After this modification, the images that previously had errors no longer reported errors. dict(type='Resize', scale=( 960,# 640--->960 960,# 640--->960 ), keep_ratio=True), dict( type='Pad', size=( 960,# 640--->960 960,# 640--->960 ),

werk104 commented 6 months ago

I encountered this error as well when using the RTMDet instance segmentation model for image inference. I noticed that the length or width of the image that caused the error was larger than the scale parameter (640_640) of the Resize in the test_pipeline. Therefore, I directly changed the scale to be larger than the length and width of the image that needed inference (960_960). After this modification, the images that previously had errors no longer reported errors. dict(type='Resize', scale=( 960,# 640--->960 960,# 640--->960 ), keep_ratio=True), dict( type='Pad', size=( 960,# 640--->960 960,# 640--->960 ),

When I use demo/large_image_demo.py for large image inference, I find that when the patch-size is equal to the size of the large image to be inferred, which is also equal to the scale size in test_pipeline (640,640), I can get the following results (equivalent to each block, the effect is very poor): ny050

When I change the parameters to --patch-size 160 --patch-overlap-ratio 0, and use print to print the size of binary_masks, I find that the dimension of binary_masks is (17,160,160). After switching to another (640,640) image, the binary_masks obtained is (12,160,160), indicating that the program stacks the detected masks according to the patch-size, but I don’t know how to parse it into the correct mask. Trying to lay it out directly by row, I got the strange result as follows: ny050

DAIJinJ commented 6 months ago

assert img.shape[:2] == binary_masks.shape[

AssertionError: binary_masks must have the same shape with image，

I printed their size display： binary_mask (752, 512, 512) img (10678, 8278, 3)，

The size that this img should read is the size after slicing, but it does not

ksv87 commented 4 months ago

i fix it in https://github.com/open-mmlab/mmengine/blob/85c83ba61689907fb1775713622b1b146d82277b/mmengine/visualization/visualizer.py#L881

    if img.shape[:2] != binary_masks.shape[1:]:
        shape = (binary_masks.shape[0],) + img.shape[:2]
        new_mask = np.zeros(shape, np.uint8)
        mh, mw = binary_masks.shape[1:]
        new_mask[..., :mh, :mw]=binary_masks
        binary_masks = new_mask

    assert img.shape[:2] == binary_masks.shape[
                            1:], '`binary_masks` must have ' \
                                 'the same shape with image'

werk104 commented 4 months ago

i fix it in https://github.com/open-mmlab/mmengine/blob/85c83ba61689907fb1775713622b1b146d82277b/mmengine/visualization/visualizer.py#L881

    if img.shape[:2] != binary_masks.shape[1:]:
        shape = (binary_masks.shape[0],) + img.shape[:2]
        new_mask = np.zeros(shape, np.uint8)
        mh, mw = binary_masks.shape[1:]
        new_mask[..., :mh, :mw]=binary_masks
        binary_masks = new_mask

    assert img.shape[:2] == binary_masks.shape[
                            1:], '`binary_masks` must have ' \
                                 'the same shape with image'

Thank you for fixing the error. However, after modifying the code according to your suggestions, the results are still problematic (it went from an error to running but not producing the correct results). Based on the results, it can be inferred that each bbox corresponds to a mask, and each mask is a layer in binary_masks.

Your code does make the mask size correct, but the original mask positions are incorrect, all located in the upper left corner of new_mask. Therefore, the mask positions should be determined using the center coordinates of the bbox (dividing the original image into a grid array of patch-size*patch-size, using the row and column numbers in the grid to determine the mask position, and finally placing the mask in the corresponding grid).

I made some modifications based on your code, as follows: if img.shape[:2] != binary_masks.shape[1:]: # .... shape = (binary_masks.shape[0],) + img.shape[:2] new_mask = np.zeros(shape, np.uint8) mh, mw = binary_masks.shape[1:]

new_mask[..., :mh, :mw]=binary_masks

masks_index = 0
for bbox_index in np_bboxes:
    x1, y1, x2, y2 = bbox_index
    xc = (x1 + x2) / 2
    yc = (y1 + y2) / 2
    x_index = int(xc // mw)
    y_index = int(yc // mh)
    print(x_index, y_index)
    new_mask[masks_index, mh * y_index:mh * (y_index + 1), mw * x_index:mw * (x_index + 1)] = binary_masks[masks_index, :, :]
    masks_index += 1
binary_masks = new_mask

assert img.shape[:2] == binary_masks.shape[1:], 'binary_masks must have the same shape with image' # .... Here, np_bboxes is a newly created global variable used to pass the bboxes variable from line 750 (https://github.com/open-mmlab/mmengine/blob/85c83ba61689907fb1775713622b1b146d82277b/mmengine/visualization/visualizer.py#L750) to the current position. Then, the center coordinates (xc, yc) of each bbox are calculated, and the mask position is determined using (xc, yc). The command to run is shown in the image below: bielefeld_000000_049313_gtFine_color The result after modifying according to your code: im3 The result after my further modifications: im3

However, my code only produces correct results when --patch-overlap-ratio is set to 0 and --patch-size can be evenly divided by the image’s length and width. This is because the overlap ratio (--patch-overlap-ratio) affects the position of the masks. When the overlap ratio is not zero, the grids in the divided grid array overlap with each other, and the mask position cannot be simply determined by integer division. Additionally, if --patch-size cannot be evenly divided by the image’s length and width, there will be non-square grids, requiring adjustments to new_mask[masks_index, mh y_index:mh (y_index + 1), mw x_index:mw (x_index + 1)] = binary_masks[masks_index, :, :] (you can add a few if branches to handle this).

georgeblu1 commented 3 months ago

Is there a possibility that doesn't require changes on mmengine code?

open-mmlab / mmdetection

"binary marks" must have the same shape as the image #10273

model.backbone.frozen_stages=4

data_root = ''

set how many epochs to save the model, and the maximum number of models to save,`save_best` is also the best model (recommended).

base_lr = 0.004/16

use cosine lr from 150 to 300 epoch

new_mask[..., :mh, :mw]=binary_masks

open-mmlab / mmdetection

"binary marks" must have the same shape as the image #10273

model.backbone.frozen_stages=4

data_root = ''

set how many epochs to save the model, and the maximum number of models to save,save_best is also the best model (recommended).

base_lr = 0.004/16

use cosine lr from 150 to 300 epoch

new_mask[..., :mh, :mw]=binary_masks

set how many epochs to save the model, and the maximum number of models to save,`save_best` is also the best model (recommended).