Open Sneriko opened 1 year ago
I can also add that the mask predicted is 0-4 pixels too small in the y-dimension
marked
I encountered the same error, how to solve it?
Same problem here
same issue, please consider fixing this
Still an issue
still a problem.
The issue is that the binary_masks
are rescaled according to the configs defined in test_pipeline
(e.g. dict(type='FixShapeResize...')
).
But the images are being loaded from the path and not resized. Thus, it gives a problem with the different sizes.
Bug is in line 136 file mmdet>engine>hooks>visualization_hook.py Images should be resized according to the test_pipeline.
How to solve it?
still a problem. The issue is that the
binary_masks
are rescaled according to the configs defined intest_pipeline
(e.g.dict(type='FixShapeResize...')
). But the images are being loaded from the path and not resized. Thus, it gives a problem with the different sizes.Bug is in line 136 file mmdet>engine>hooks>visualization_hook.py Images should be resized according to the test_pipeline.
Have you solved it?
I encountered this error as well when using the RTMDet instance segmentation model for image inference. I noticed that the length or width of the image that caused the error was larger than the scale parameter (640640) of the Resize in the test_pipeline. Therefore, I directly changed the scale to be larger than the length and width of the image that needed inference (960960). After this modification, the images that previously had errors no longer reported errors. dict(type='Resize', scale=( 960,# 640--->960 960,# 640--->960 ), keep_ratio=True), dict( type='Pad', size=( 960,# 640--->960 960,# 640--->960 ),
I encountered this error as well when using the RTMDet instance segmentation model for image inference. I noticed that the length or width of the image that caused the error was larger than the scale parameter (640_640) of the Resize in the test_pipeline. Therefore, I directly changed the scale to be larger than the length and width of the image that needed inference (960_960). After this modification, the images that previously had errors no longer reported errors. dict(type='Resize', scale=( 960,# 640--->960 960,# 640--->960 ), keep_ratio=True), dict( type='Pad', size=( 960,# 640--->960 960,# 640--->960 ),
When I use demo/large_image_demo.py for large image inference, I find that when the patch-size is equal to the size of the large image to be inferred, which is also equal to the scale size in test_pipeline (640,640), I can get the following results (equivalent to each block, the effect is very poor):
When I change the parameters to --patch-size 160 --patch-overlap-ratio 0, and use print to print the size of binary_masks, I find that the dimension of binary_masks is (17,160,160). After switching to another (640,640) image, the binary_masks obtained is (12,160,160), indicating that the program stacks the detected masks according to the patch-size, but I don’t know how to parse it into the correct mask. Trying to lay it out directly by row, I got the strange result as follows:
assert img.shape[:2] == binary_masks.shape[
AssertionError: binary_masks
must have the same shape with image,
I printed their size display: binary_mask (752, 512, 512) img (10678, 8278, 3),
The size that this img should read is the size after slicing, but it does not
if img.shape[:2] != binary_masks.shape[1:]:
shape = (binary_masks.shape[0],) + img.shape[:2]
new_mask = np.zeros(shape, np.uint8)
mh, mw = binary_masks.shape[1:]
new_mask[..., :mh, :mw]=binary_masks
binary_masks = new_mask
assert img.shape[:2] == binary_masks.shape[
1:], '`binary_masks` must have ' \
'the same shape with image'
if img.shape[:2] != binary_masks.shape[1:]: shape = (binary_masks.shape[0],) + img.shape[:2] new_mask = np.zeros(shape, np.uint8) mh, mw = binary_masks.shape[1:] new_mask[..., :mh, :mw]=binary_masks binary_masks = new_mask assert img.shape[:2] == binary_masks.shape[ 1:], '`binary_masks` must have ' \ 'the same shape with image'
Thank you for fixing the error. However, after modifying the code according to your suggestions, the results are still problematic (it went from an error to running but not producing the correct results). Based on the results, it can be inferred that each bbox corresponds to a mask, and each mask is a layer in binary_masks.
Your code does make the mask size correct, but the original mask positions are incorrect, all located in the upper left corner of new_mask. Therefore, the mask positions should be determined using the center coordinates of the bbox (dividing the original image into a grid array of patch-size*patch-size, using the row and column numbers in the grid to determine the mask position, and finally placing the mask in the corresponding grid).
I made some modifications based on your code, as follows: if img.shape[:2] != binary_masks.shape[1:]: # .... shape = (binary_masks.shape[0],) + img.shape[:2] new_mask = np.zeros(shape, np.uint8) mh, mw = binary_masks.shape[1:]
masks_index = 0
for bbox_index in np_bboxes:
x1, y1, x2, y2 = bbox_index
xc = (x1 + x2) / 2
yc = (y1 + y2) / 2
x_index = int(xc // mw)
y_index = int(yc // mh)
print(x_index, y_index)
new_mask[masks_index, mh * y_index:mh * (y_index + 1), mw * x_index:mw * (x_index + 1)] = binary_masks[masks_index, :, :]
masks_index += 1
binary_masks = new_mask
assert img.shape[:2] == binary_masks.shape[1:], 'binary_masks
must have the same shape with image' # ....
Here, np_bboxes is a newly created global variable used to pass the bboxes variable from line 750 (https://github.com/open-mmlab/mmengine/blob/85c83ba61689907fb1775713622b1b146d82277b/mmengine/visualization/visualizer.py#L750) to the current position. Then, the center coordinates (xc, yc) of each bbox are calculated, and the mask position is determined using (xc, yc).
The command to run is shown in the image below:
The result after modifying according to your code:
The result after my further modifications:
However, my code only produces correct results when --patch-overlap-ratio is set to 0 and --patch-size can be evenly divided by the image’s length and width. This is because the overlap ratio (--patch-overlap-ratio) affects the position of the masks. When the overlap ratio is not zero, the grids in the divided grid array overlap with each other, and the mask position cannot be simply determined by integer division. Additionally, if --patch-size cannot be evenly divided by the image’s length and width, there will be non-square grids, requiring adjustments to new_mask[masks_index, mh y_index:mh (y_index + 1), mw x_index:mw (x_index + 1)] = binary_masks[masks_index, :, :] (you can add a few if branches to handle this).
Is there a possibility that doesn't require changes on mmengine code?
I get this error message: '"binary marks" must have the same shape as the image', when I run inference with the trained model. It's an RTMDet model that I trained with mmdetection 3.x. What could be the possible causes for the model to predict a mask with a different dimension than the image?
This is my training config:
base = '/home/erik/Riksarkivet/Projects/HTR_Pipeline/mmdetection/configs/rtmdet/rtmdet-ins_m_8xb32-300e_coco.py'
load_from = '/home/erik/Riksarkivet/Projects/HTR_Pipeline/models/checkpoints/rtmdet-ins_m_8xb32-300e_coco_20221123_001039-6eba602e.pth'
model.backbone.frozen_stages=4
data_root = ''
work_dir = '/home/erik/Riksarkivet/Projects/HTR_Pipeline/models/checkpoints/rtmdet_regions_6'
base_lr = 0.004/16
train_batch_size_per_gpu = 2 val_batch_size_per_gpu = 1 train_num_workers = 1 num_classes = 1
metainfo = { 'classes': ('TextRegion'), 'palette': [ (220, 20, 60) ] }
model = dict(bbox_head=dict(num_classes=1))
icdar_2019 = dict( type='CocoDataset', metainfo=metainfo, data_prefix=dict(img='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/ICDAR-2019/clean/'), ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/ICDAR-2019/clean/gt_files/coco_regions2.json', pipeline=base.train_pipeline )
icdar_2019_test = dict( type='CocoDataset', metainfo=metainfo, data_prefix=dict(img='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/ICDAR-2019/clean/'), ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/ICDAR-2019/clean/gt_files/coco_regions2.json', test_mode=True, pipeline=base.test_pipeline )
police_records = dict( type='CocoDataset', metainfo=metainfo, data_prefix=dict(img='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/'), ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/gt_files/coco_regions2.json', pipeline=base.train_pipeline )
train_list = [police_records, icdar_2019]
test_list = [icdar_2019_test]
train_dataloader = dict( batch_size=train_batch_size_per_gpu, num_workers=train_num_workers, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), dataset=dict( type='ConcatDataset', datasets=train_list, ))
val_dataloader = dict( batch_size=1, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type='CocoDataset', metainfo=metainfo, data_prefix=dict(img='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/'), ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/gt_files/coco_regions2.json', pipeline=base.test_pipeline, test_mode=True ))
test_dataloader = val_dataloader
val_evaluator = dict( type='CocoMetric', metric=['bbox', 'segm'], ann_file='/media/erik/Elements/Riksarkivet/data/datasets/htr/segmentation/police_records/gt_files/coco_regions2.json' ) test_evaluator = val_evaluator
model = dict(test_cfg=dict( nms_pre=200, min_bbox_size=0, score_thr=0.4, nms=dict(type='nms', iou_threshold=0.6), max_per_img=50, mask_thr_binary=0.5))
default_hooks = dict(
set how many epochs to save the model, and the maximum number of models to save,
save_best
is also the best model (recommended).max_epochs = 12 stage2_num_epochs = 2
base_lr = 0.004/16
interval = 12
train_cfg = dict( max_epochs=12, val_interval=12, dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)])
test_cfg = dict(pipeline=base.test_pipeline)
pipeline=base.test_pipeline
optim_wrapper = dict( delete=True, type='OptimWrapper', optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True))
param_scheduler = [ dict( type='LinearLR', start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict(
use cosine lr from 150 to 300 epoch
]
vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer', save_dir='/home/erik/Riksarkivet/Projects/HTR_Pipeline/output')
greatful for help!