Inference of mask2former on large images

mboaz17 commented 6 months ago

Hello,

I am trying to do train and evaluate a mask2former model on large images (3648x5472 pixels). Training is not an issue in terms of memory usage, since I'm using cropped patches - the train_pipeline contains random cropping: dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75)

However, inference of the model on such large images requires huge memory. For example, for the mask2former_swin-b model, about 40GB of GPU memory are necessary.

This is why I tried changing the test_cfg of the model from test_cfg=dict(mode='whole')) to sliding window inference: test_cfg=dict(mode='slide', crop_size=(1366, 2048), stride=(1141, 1712)))

However, during the evaluation phase of training I received the following error:

File "/home/airsim/repos/open-mmlab/mmsegmentation/tools/train.py", line 100, in main runner.train() File "/home/airsim/anaconda3/envs/open-mmlab-newest/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train model = self.train_loop.run() # type: ignore File "/home/airsim/anaconda3/envs/open-mmlab-newest/lib/python3.8/site-packages/mmengine/runner/loops.py", line 102, in run self.runner.val_loop.run() File "/home/airsim/anaconda3/envs/open-mmlab-newest/lib/python3.8/site-packages/mmengine/runner/loops.py", line 371, in run self.run_iter(idx, data_batch) File "/home/airsim/anaconda3/envs/open-mmlab-newest/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/airsim/anaconda3/envs/open-mmlab-newest/lib/python3.8/site-packages/mmengine/runner/loops.py", line 391, in run_iter outputs = self.runner.model.val_step(data_batch) File "/home/airsim/anaconda3/envs/open-mmlab-newest/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 133, in val_step return self._run_forward(data, mode='predict') # type: ignore File "/home/airsim/anaconda3/envs/open-mmlab-newest/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 361, in _run_forward results = self(data, mode=mode) File "/home/airsim/anaconda3/envs/open-mmlab-newest/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/airsim/anaconda3/envs/open-mmlab-newest/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, **kwargs) File "/home/airsim/repos/open-mmlab/mmsegmentation/mmseg/models/segmentors/base.py", line 96, in forward return self.predict(inputs, data_samples) File "/home/airsim/repos/open-mmlab/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 220, in predict seg_logits = self.inference(inputs, batch_img_metas) File "/home/airsim/repos/open-mmlab/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 341, in inference seg_logit = self.slide_inference(inputs, batch_img_metas) File "/home/airsim/repos/open-mmlab/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 284, in slide_inference preds += F.pad(crop_seg_logit, RuntimeError: The size of tensor a (5472) must match the size of tensor b (8896) at non-singleton dimension 3

It seems that for the mask2former, the crop_seg_logit has the shape of the original image (3648x5472 in this case), unlike other mmsegmentation models, which yield the shape of the slice (1366x2048 in this case).

Can anyone think of a solution to this problem?

Thanks alot!

Saillxl commented 5 months ago

Do you have any solutions now？

Skyninth commented 4 months ago

same problem!! I want to try beit in mask2former, and I set the image size(896,896),and test_cfg=dict(mode='slide', crop_size=(896,896), stride=(640, 640)), train and test pipline is dict(type='Resize', scale=(896, 3584), keep_ratio=True), then get error:File "/home/airsim/repos/open-mmlab/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 284, in slide_inference preds += F.pad(crop_seg_logit, RuntimeError: The size of tensor a must match the size of tensor b at non-singleton dimension 2

open-mmlab / mmsegmentation

Inference of mask2former on large images #3666