open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://mmsegmentation.readthedocs.io/en/main/
Apache License 2.0
8.36k stars 2.63k forks source link

High CPU usage during evaluation #1451

Open BerenChou opened 2 years ago

BerenChou commented 2 years ago

I use the UNet config in fcn_unet_s5-d16_ce-1.0-dice-3.0_128x128_40k_chase-db1.py with little but necessary changes, such as img_scale, to train on my customized dataset (very similar to Chase_db1). My training script is tools/train.py and I train the UNet with one RTX3090(totally two RTX3090s in my machine) with CUDA11.3 and the latest version of mmseg. During training, everything is fine, but when evaluation(EvalHook), the 52 cores CPU reached a very high CPU usage. See the image bellow:

image

Is it noraml or did I do some wrong? For comparison, here is the CPU usage during training:

image

Config

2022-04-06 20:13:26,083 - mmseg - INFO - Config:
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained=None,
    backbone=dict(type='UNet', norm_cfg=dict(type='BN', requires_grad=True)),
    decode_head=dict(
        type='FCNHead',
        in_channels=64,
        channels=64,
        num_classes=2,
        norm_cfg=dict(type='BN', requires_grad=True),
        in_index=4,
        loss_decode=[
            dict(type='DiceLoss', loss_weight=3.0),
            dict(type='CrossEntropyLoss', loss_weight=2.0)
        ],
        num_convs=0,
        concat_input=False),
    train_cfg=dict(),
    test_cfg=dict(mode='whole'))
dataset_type = 'DTSDataset'
data_root = 'data/DTSDataset'
img_norm_cfg = dict(
    mean=[118.709, 118.709, 118.709],
    std=[93.575, 93.575, 93.575],
    to_rgb=True)
img_scale = (512, 512)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(512, 512), ratio_range=(1.0, 1.0)),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[118.709, 118.709, 118.709],
        std=[93.575, 93.575, 93.575],
        to_rgb=True),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(512, 512),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[118.709, 118.709, 118.709],
                std=[93.575, 93.575, 93.575],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='RepeatDataset',
        times=40000,
        dataset=dict(
            type='DTSDataset',
            data_root='data/DTSDataset',
            img_dir='images/training',
            ann_dir='annotations/training',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations'),
                dict(
                    type='Resize',
                    img_scale=(512, 512),
                    ratio_range=(1.0, 1.0)),
                dict(type='RandomFlip', prob=0.5),
                dict(type='PhotoMetricDistortion'),
                dict(
                    type='Normalize',
                    mean=[118.709, 118.709, 118.709],
                    std=[93.575, 93.575, 93.575],
                    to_rgb=True),
                dict(type='DefaultFormatBundle'),
                dict(type='Collect', keys=['img', 'gt_semantic_seg'])
            ])),
    val=dict(
        type='DTSDataset',
        data_root='data/DTSDataset',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(512, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[118.709, 118.709, 118.709],
                        std=[93.575, 93.575, 93.575],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='DTSDataset',
        data_root='data/DTSDataset',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(512, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[118.709, 118.709, 118.709],
                        std=[93.575, 93.575, 93.575],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=40000)
checkpoint_config = dict(by_epoch=False, interval=4000)
evaluation = dict(
    interval=200,
    metric=['mDice', 'mFscore', 'mIoU'],
    pre_eval=True,
    save_best='Dice.tumour',
    rule='greater')
work_dir = '../work_dir'
gpu_ids = [0]
auto_resume = False

Dataset

from .builder import DATASETS
from .custom import CustomDataset

@DATASETS.register_module()
class DTSDataset(CustomDataset):

    CLASSES = ('background', 'tumour')
    PALETTE = [[120, 120, 120], [6, 230, 230]]

    def __init__(self, **kwargs):
        super(DTSDataset, self).__init__(img_suffix='.png', seg_map_suffix='.png', **kwargs)

Environment

sys.platform: linux
Python: 3.9.12 (main, Apr  5 2022, 06:56:58) [GCC 7.5.0]
CUDA available: True
GPU 0,1: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.3.r11.3/compiler.29920130_0
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.12.0
OpenCV: 4.5.5
MMCV: 1.4.8
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.3
MMSegmentation: 0.23.0+
linfangjian01 commented 2 years ago

Hi, i think it should be a normal phenomenon. The cpu is used to load ckpt and collect results by default during inference.

BerenChou commented 2 years ago

Before I update mmseg to the latest version, I was using 0.20.2, and I was using the training script from MMSegmentation_Tutorial.ipynb, here is my script:

from mmcv import Config
from mmseg.apis import set_random_seed
from mmseg.datasets import build_dataset
from mmseg.models import build_segmentor
from mmseg.apis import train_segmentor
import mmcv
import os.path as osp

cfg = Config.fromfile('config/total_cfg.py')
cfg.gpu_ids = [0]
cfg.seed = 0

set_random_seed(0, deterministic=False)

datasets = [build_dataset(cfg.data.train)]

model = build_segmentor(cfg.model)

# Add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_segmentor(model, datasets, cfg, distributed=False, validate=True, meta=dict())

When using 0.20.2 and this training script, the cpu usage is normal during evaluate, the same as training time, not with many cores with almost 100% usage.

And the most strange thing is, when I edit the unet.py in mmseg/models/backbones/unet.py, putting some codes in the init() func of Class UNet like this:

import torch
self.pe = nn.Parameter(torch.zeros(1, 196, 768))

The CPU usage during evaluation reached again very high. The self.pe I write in the init func, I am even not using it in the forward func, just create it in the init func. This really confuses me, and I think it is not normal.

linfangjian01 commented 2 years ago

Thank you very much for your question, we will take some time to check this bug. If it is convenient, could you provide some tests for other models?

BerenChou commented 2 years ago

Thank you very much for your question, we will take some time to check this bug. If it is convenient, could you provide some tests for other models?

OK, I think I can provide some tests for other models, but no earlier than 9 April, got some other things at hand.

gautampawnesh commented 2 years ago

I am also facing the similar issue. I am training deeplabv3+ from the configuration. During evaluation, it uses cpu heavily.