Closed sainivedh19pt closed 2 years ago
I have two questions or comments.
(1) Could you try to train these bug-happened models with our provided datasets such as Cityscapes and ADE20K to test whether cuda out of memory still encounter?
(2) Before your issue, I have faced certain problems because I missed 'RandomCrop'
and 'Pad'
like here:
https://github.com/open-mmlab/mmsegmentation/pull/955#issuecomment-1005385112
Hope my experience could help you locate your problems.
Best,
Hi @MengzhangLI ,
Thanks for the response
The error I faced is not CUDA Out of memory, posting extended stacktrace for better insights,
File "C:\Users\Sai_Nivedh\Projects\mmsegmentation\mmseg\apis\train.py", line 174, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "c:\users\sai_nivedh\projects\mmcv\mmcv\runner\iter_based_runner.py", line 134, in run
iter_runner(iter_loaders[i], **kwargs)
File "c:\users\sai_nivedh\projects\mmcv\mmcv\runner\iter_based_runner.py", line 61, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\data_parallel.py", line 74, in train_step
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\data_parallel.py", line 53, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 51, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 44, in scatter
return scatter_map(inputs)
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 29, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 34, in scatter_map
out = list(map(type(obj), zip(*map(scatter_map, obj.items()))))
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 29, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\scatter_gather.py", line 27, in scatter_map
return Scatter.forward(target_gpus, obj.data)
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\_functions.py", line 71, in forward
outputs = scatter(input, target_gpus, streams)
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\_functions.py", line 15, in scatter
[streams[i // chunk_size]]) for i in range(len(input))
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\_functions.py", line 15, in <listcomp>
[streams[i // chunk_size]]) for i in range(len(input))
File "c:\users\sai_nivedh\projects\mmcv\mmcv\parallel\_functions.py", line 24, in scatter
output = output.cuda(devices[0], non_blocking=True)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
It is usually caused by your wrong num_classes
in config, it should be n = number of foreground + background (usually it is label 0)
. For example, if you have only one kind of foreground, it should be num_classes=2
.
Checklist
Describe the bug
Reproduction
What command or script did you run?
Environment
python mmseg/utils/collect_env.py
to collect necessary environment information and paste it here.TorchVision: 0.11.3 OpenCV: 4.5.5 MMCV: 1.4.4 MMCV Compiler: MSVC 193030709 MMCV CUDA Compiler: 11.6 MMSegmentation: 0.21.1+b163101
Whole Config
Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!