Colab GPU RAM (16 GB) overflow while inferencing using Segformer B4 pretrained model with Cityscape dataset

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug I am trying to do inference with Segformer B4 pretrained model available in mmsegmentation using Cityscape dataset in Colab. GPU ram is getting overflown.

Reproduction

What command or script did you run?

from cityscapesscripts.download import downloader from typing import Any, Callable, Dict, List, Optional, Union, Tuple from torchvision.datasets import Cityscapes import albumentations as A from albumentations.pytorch import ToTensorV2 import numpy as np import os

COLAB_DIR = '/content/data/'

if not os.path.exists(COLAB_DIR): !mkdir /content/data

session = downloader.login() downloader.get_available_packages(session=session)

print('Downloading gtFine and leftImg8bit packages ...\n') package_list =['gtFine_trainvaltest.zip','leftImg8bit_trainvaltest.zip'] downloader.download_packages(session=session, package_names=package_list, destination_path=COLAB_DIR)

!unzip -q /content/data/leftImg8bit_trainvaltest.zip -d /content/data/ !unzip -q /content/data/gtFine_trainvaltest.zip -d /content/data/

from cityscapesscripts.helpers import labels def encode_segmap(mask):

mask_copy = mask.copy() for label in labels.labels: mask_copy[mask==label.id] = label.trainId mask_copy[mask_copy==255] = 19 return mask_copy

class MyClass(Cityscapes): def getitem(self, index: int) -> Tuple[Any, Any]: image = Image.open(self.images[index]).convert('RGB') image = np.array(image) targets: Any = [] for i, t in enumerate(self.target_type): if t == 'polygon': target = self._load_json(self.targets[index][i]) else: target = Image.open(self.targets[index][i]) target = np.array(target) target = encode_segmap(target) target = np.resize(target,(128,128)) targets.append(target) target = tuple(targets) if len(targets) > 1 else targets[0] if self.transforms is not None: pass return image, torch.from_numpy(target)

from PIL import Image data_train= MyClass('/content/data/', split='train', mode='fine', target_type='semantic')

data_val = MyClass('/content/data/', split='val', mode='fine', target_type='semantic')

!git clone https://github.com/open-mmlab/mmsegmentation.git %cd mmsegmentation !git checkout main !pip install -e .

import mmseg from mmseg.apis import inference_model, init_model, show_result_pyplot import mmcv import cv2 import glob import os import time from os.path import join, isdir from os import listdir, rmdir from shutil import move, rmtree, make_archive import pickle print(mmseg.version)

import torch import torchvision from torchvision import datasets, models, transforms from torch.utils.data import DataLoader from torchsummary import summary import torch.nn as nn import numpy as np import os from typing import Callable, Dict, List, Tuple, Union from torch import nn, optim, utils import torch.nn.functional as F from itertools import permutations

!mim download mmsegmentation --config segformer_mit-b4_8xb1-160k_cityscapes-1024x1024 --dest .

config_file_1 = '/content/mmsegmentation/segformer_mit-b4_8xb1-160k_cityscapes-1024x1024.py' checkpoint_file_1 = '/content/mmsegmentation/segformer_mit-b4_8x1_1024x1024_160k_cityscapes_20211207_080709-07f6c333.pth'

model_1 = init_model(config_file_1, checkpoint_file_1, device=device)

idx, (i,l) = next(enumerate(data_train))

res = inference_model(model_1, i)

Did you make any modifications on the code or config? Did you understand what you have modified? I didn't do any modification on the mmsegmentation code.
What dataset did you use? Cityscape

Environment Google Colab Pro.

Please run python mmseg/utils/collect_env.py to collect necessary environment information and paste it here. sys.platform: linux Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: Tesla V100-SXM2-16GB CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 12.2, V12.2.140 GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 2.1.0+cu121 PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 12.1
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 8.9.2
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.16.0+cu121 OpenCV: 4.8.0 MMEngine: 0.10.2 MMSegmentation: 1.2.2+c685fe6

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback

OutOfMemoryError Traceback (most recent call last) in <cell line: 1>() ----> 1 res = inference_model(model_1, i)

25 frames /content/mmsegmentation/mmseg/apis/inference.py in inference_model(model, img) 114 # forward the model 115 #with torch.no_grad(): --> 116 results = model.test_step(data) 117 118 return results if is_batch else results[0]

/usr/local/lib/python3.10/dist-packages/mmengine/model/base_model/base_model.py in test_step(self, data) 143 """ 144 data = self.data_preprocessor(data, False) --> 145 return self._run_forward(data, mode='predict') # type: ignore 146 147 def parse_losses(

/usr/local/lib/python3.10/dist-packages/mmengine/model/base_model/base_model.py in _run_forward(self, data, mode) 344 """ 345 if isinstance(data, dict): --> 346 results = self(*data, mode=mode) 347 elif isinstance(data, (list, tuple)): 348 results = self(data, mode=mode)

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs) 1519 1520 def _call_impl(self, *args, **kwargs):

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:

/content/mmsegmentation/mmseg/models/segmentors/base.py in forward(self, inputs, data_samples, mode) 94 return self.loss(inputs, data_samples) 95 elif mode == 'predict': ---> 96 return self.predict(inputs, data_samples) 97 elif mode == 'tensor': 98 return self._forward(inputs, data_samples)

/content/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in predict(self, inputs, data_samples) 218 ] * inputs.shape[0] 219 --> 220 seg_logits = self.inference(inputs, batch_img_metas) 221 222 return self.postprocess_result(seg_logits, data_samples)

/content/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in inference(self, inputs, batch_img_metas) 339 level=logging.WARN) 340 if self.test_cfg.mode == 'slide': --> 341 seg_logit = self.slide_inference(inputs, batch_img_metas) 342 else: 343 seg_logit = self.whole_inference(inputs, batch_img_metas)

/content/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in slide_inference(self, inputs, batch_img_metas) 281 # the output of encode_decode is seg logits tensor map 282 # with shape [N, C, H, W] --> 283 crop_seg_logit = self.encode_decode(crop_img, batch_img_metas) 284 preds += F.pad(crop_seg_logit, 285 (int(x1), int(preds.shape[3] - x2), int(y1),

/content/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in encode_decode(self, inputs, batch_img_metas) 126 """Encode images with backbone and decode into a semantic segmentation 127 map of the same size as input.""" --> 128 x = self.extract_feat(inputs) 129 seg_logits = self.decode_head.predict(x, batch_img_metas, 130 self.test_cfg)

/content/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in extract_feat(self, inputs) 117 def extract_feat(self, inputs: Tensor) -> List[Tensor]: 118 """Extract features from images.""" --> 119 x = self.backbone(inputs) 120 if self.with_neck: 121 x = self.neck(x)

/content/mmsegmentation/mmseg/models/backbones/mit.py in forward(self, x) 442 x, hw_shape = layer0 443 for block in layer[1]: --> 444 x = block(x, hw_shape) 445 x = layer2 446 x = nlc_to_nchw(x, hw_shape)

/content/mmsegmentation/mmseg/models/backbones/mit.py in forward(self, x, hw_shape) 292 x = cp.checkpoint(_inner_forward, x) 293 else: --> 294 x = _inner_forward(x) 295 return x 296

/content/mmsegmentation/mmseg/models/backbones/mit.py in _inner_forward(x) 285 286 def _inner_forward(x): --> 287 x = self.attn(self.norm1(x), hw_shape, identity=x) 288 x = self.ffn(self.norm2(x), hw_shape, identity=x) 289 return x

/content/mmsegmentation/mmseg/models/backbones/mit.py in forward(self, x, hw_shape, identity) 179 x_kv = x_kv.transpose(0, 1) 180 --> 181 out = self.attn(query=x_q, key=x_kv, value=x_kv)[0] 182 183 if self.batch_first:

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/activation.py in forward(self, query, key, value, key_padding_mask, need_weights, attn_mask, average_attn_weights, is_causal) 1239 is_causal=is_causal) 1240 else: -> 1241 attn_output, attn_output_weights = F.multi_head_attention_forward( 1242 query, key, value, self.embed_dim, self.num_heads, 1243 self.in_proj_weight, self.in_proj_bias,

/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in multi_head_attention_forward(query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training, key_padding_mask, need_weights, attn_mask, use_separate_proj_weight, q_proj_weight, k_proj_weight, v_proj_weight, static_k, static_v, average_attn_weights, is_causal) 5404 else: 5405 attn_output_weights = torch.bmm(q_scaled, k.transpose(-2, -1)) -> 5406 attn_output_weights = softmax(attn_output_weights, dim=-1) 5407 if dropout_p > 0.0: 5408 attn_output_weights = dropout(attn_output_weights, p=dropout_p)

/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in softmax(input, dim, _stacklevel, dtype) 1854 dim = _get_softmax_dim("softmax", input.dim(), _stacklevel) 1855 if dtype is None: -> 1856 ret = input.softmax(dim) 1857 else: 1858 ret = input.softmax(dim, dtype=dtype)

OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacty of 15.77 GiB of which 32.38 MiB is free. Process 16217 has 15.74 GiB memory in use. Of the allocated memory 15.24 GiB is allocated by PyTorch, and 127.57 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Bug fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

open-mmlab / mmsegmentation

Colab GPU RAM (16 GB) overflow while inferencing using Segformer B4 pretrained model with Cityscape dataset #3507