Open BerenChou opened 2 years ago
Hi, i think it should be a normal phenomenon. The cpu is used to load ckpt and collect results by default during inference.
Before I update mmseg to the latest version, I was using 0.20.2, and I was using the training script from MMSegmentation_Tutorial.ipynb, here is my script:
from mmcv import Config
from mmseg.apis import set_random_seed
from mmseg.datasets import build_dataset
from mmseg.models import build_segmentor
from mmseg.apis import train_segmentor
import mmcv
import os.path as osp
cfg = Config.fromfile('config/total_cfg.py')
cfg.gpu_ids = [0]
cfg.seed = 0
set_random_seed(0, deterministic=False)
datasets = [build_dataset(cfg.data.train)]
model = build_segmentor(cfg.model)
# Add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES
# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_segmentor(model, datasets, cfg, distributed=False, validate=True, meta=dict())
When using 0.20.2 and this training script, the cpu usage is normal during evaluate, the same as training time, not with many cores with almost 100% usage.
And the most strange thing is, when I edit the unet.py in mmseg/models/backbones/unet.py, putting some codes in the init() func of Class UNet like this:
import torch
self.pe = nn.Parameter(torch.zeros(1, 196, 768))
The CPU usage during evaluation reached again very high. The self.pe I write in the init func, I am even not using it in the forward func, just create it in the init func. This really confuses me, and I think it is not normal.
Thank you very much for your question, we will take some time to check this bug. If it is convenient, could you provide some tests for other models?
Thank you very much for your question, we will take some time to check this bug. If it is convenient, could you provide some tests for other models?
OK, I think I can provide some tests for other models, but no earlier than 9 April, got some other things at hand.
I am also facing the similar issue. I am training deeplabv3+ from the configuration. During evaluation, it uses cpu heavily.
I use the UNet config in
fcn_unet_s5-d16_ce-1.0-dice-3.0_128x128_40k_chase-db1.py
with little but necessary changes, such as img_scale, to train on my customized dataset (very similar to Chase_db1). My training script is tools/train.py and I train the UNet with one RTX3090(totally two RTX3090s in my machine) with CUDA11.3 and the latest version of mmseg. During training, everything is fine, but when evaluation(EvalHook), the 52 cores CPU reached a very high CPU usage. See the image bellow:Is it noraml or did I do some wrong? For comparison, here is the CPU usage during training:
Config
Dataset
Environment