In the course of training, we encountered this problem
`/home/buaa/anaconda3/envs/vit/bin/python3.6 /snap/pycharm-community/302/plugins/python-ce/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 45947 --file /home/buaa/songyue/lawin-master/tools/train.py
Connected to pydev debugger (build 222.4345.23)
fatal: not a git repository (or any of the parent directories): .git
2022-10-21 17:23:32,633 - mmseg - INFO - Environment info:
2022-10-21 17:23:32,633 - mmseg - INFO - Distributed training: True
INFO:mmseg:Distributed training: True
2022-10-21 17:23:33,165 - mmseg - INFO - Config:
norm_cfg = dict(type='SyncBN', requires_grad=True)
find_unused_parameters = True
................................................................................................................................................................................................................................................
2022-10-21 17:23:34,101 - mmseg - INFO - Loaded 4750 images
INFO:mmseg:Loaded 4750 images
fatal: not a git repository (or any of the parent directories): .git
2022-10-21 17:23:36,849 - mmseg - INFO - Loaded 1188 images
INFO:mmseg:Loaded 1188 images
2022-10-21 17:23:36,850 - mmseg - INFO - Start running, host: buaa@buaa-System-Product-Name, work_dir: /home/buaa/songyue/lawin-master/workdir
INFO:mmseg:Start running, host: buaa@buaa-System-Product-Name, work_dir: /home/buaa/songyue/lawin-master/workdir
2022-10-21 17:23:36,850 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
INFO:mmseg:workflow: [('train', 1)], max: 160000 iters
Traceback (most recent call last):
File "/snap/pycharm-community/302/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-community/302/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/buaa/songyue/lawin-master/tools/train.py", line 174, in
main()
File "/home/buaa/songyue/lawin-master/tools/train.py", line 170, in main
meta=meta)
File "/home/buaa/songyue/lawin-master/mmseg/apis/train.py", line 115, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
outputs = self.model.train_step(data_batch, self.optimizer, kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
output = self.module.train_step(inputs[0], kwargs[0])
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/base.py", line 152, in train_step
losses = self(data_batch)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(args, kwargs)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/base.py", line 122, in forward
return self.forward_train(img, img_metas, kwargs)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/encoder_decoder.py", line 158, in forward_train
gt_semantic_seg)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/encoder_decoder.py", line 102, in _decode_head_forward_train
self.train_cfg)
File "/home/buaa/songyue/lawin-master/mmseg/models/decode_heads/decode_head.py", line 188, in forward_train
seg_logits = self.forward(inputs)
File "/home/buaa/songyue/lawin-master/mmseg/models/decode_heads/lawin_head.py", line 328, in forward
abc = self.image_pool(_c)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, *kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/cnn/bricks/conv_module.py", line 195, in forward
x = self.norm(x)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 539, in forward
bn_training, exponential_average_factor, self.eps)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/functional.py", line 2147, in batch_norm
_verify_batch_size(input.size())
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/functional.py", line 2114, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
python-BaseException
Backend TkAgg is interactive backend. Turning interactive mode on.`
We found that this should be the problem of batchsize being 1. But we don't know where to make the changes. We thought there was something wrong with the configuration that we weren't aware of. Can you give us some suggestions?
Looking forward to your reply!
In the course of training, we encountered this problem `/home/buaa/anaconda3/envs/vit/bin/python3.6 /snap/pycharm-community/302/plugins/python-ce/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 45947 --file /home/buaa/songyue/lawin-master/tools/train.py Connected to pydev debugger (build 222.4345.23) fatal: not a git repository (or any of the parent directories): .git 2022-10-21 17:23:32,633 - mmseg - INFO - Environment info:
sys.platform: linux Python: 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] CUDA available: True GPU 0,1: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.3.r11.3/compiler.29920130_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.8.1+cu111 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, TorchVision: 0.9.1+cu111 OpenCV: 4.6.0 MMCV: 1.2.7 MMCV Compiler: GCC 7.5 MMCV CUDA Compiler: 11.3 MMSegmentation: 0.11.0+
INFO:mmseg:Environment info:
sys.platform: linux Python: 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] CUDA available: True GPU 0,1: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.3.r11.3/compiler.29920130_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.8.1+cu111 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, TorchVision: 0.9.1+cu111 OpenCV: 4.6.0 MMCV: 1.2.7 MMCV Compiler: GCC 7.5 MMCV CUDA Compiler: 11.3 MMSegmentation: 0.11.0+
2022-10-21 17:23:32,633 - mmseg - INFO - Distributed training: True INFO:mmseg:Distributed training: True 2022-10-21 17:23:33,165 - mmseg - INFO - Config: norm_cfg = dict(type='SyncBN', requires_grad=True) find_unused_parameters = True ................................................................................................................................................................................................................................................ 2022-10-21 17:23:34,101 - mmseg - INFO - Loaded 4750 images INFO:mmseg:Loaded 4750 images fatal: not a git repository (or any of the parent directories): .git 2022-10-21 17:23:36,849 - mmseg - INFO - Loaded 1188 images INFO:mmseg:Loaded 1188 images 2022-10-21 17:23:36,850 - mmseg - INFO - Start running, host: buaa@buaa-System-Product-Name, work_dir: /home/buaa/songyue/lawin-master/workdir INFO:mmseg:Start running, host: buaa@buaa-System-Product-Name, work_dir: /home/buaa/songyue/lawin-master/workdir 2022-10-21 17:23:36,850 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters INFO:mmseg:workflow: [('train', 1)], max: 160000 iters Traceback (most recent call last): File "/snap/pycharm-community/302/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/snap/pycharm-community/302/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/buaa/songyue/lawin-master/tools/train.py", line 174, in
main()
File "/home/buaa/songyue/lawin-master/tools/train.py", line 170, in main
meta=meta)
File "/home/buaa/songyue/lawin-master/mmseg/apis/train.py", line 115, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
outputs = self.model.train_step(data_batch, self.optimizer, kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
output = self.module.train_step(inputs[0], kwargs[0])
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/base.py", line 152, in train_step
losses = self(data_batch)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(args, kwargs)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/base.py", line 122, in forward
return self.forward_train(img, img_metas, kwargs)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/encoder_decoder.py", line 158, in forward_train
gt_semantic_seg)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/encoder_decoder.py", line 102, in _decode_head_forward_train
self.train_cfg)
File "/home/buaa/songyue/lawin-master/mmseg/models/decode_heads/decode_head.py", line 188, in forward_train
seg_logits = self.forward(inputs)
File "/home/buaa/songyue/lawin-master/mmseg/models/decode_heads/lawin_head.py", line 328, in forward
abc = self.image_pool(_c)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, *kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/cnn/bricks/conv_module.py", line 195, in forward
x = self.norm(x)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 539, in forward
bn_training, exponential_average_factor, self.eps)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/functional.py", line 2147, in batch_norm
_verify_batch_size(input.size())
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/functional.py", line 2114, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
python-BaseException
Backend TkAgg is interactive backend. Turning interactive mode on.`
We found that this should be the problem of batchsize being 1. But we don't know where to make the changes. We thought there was something wrong with the configuration that we weren't aware of. Can you give us some suggestions?
Looking forward to your reply!