I want to run the segmentation training code for cityscapes as follow with 1 GPU :
python tools/train.py local_configs/segformer/B1/segformer.b1.1024x1024.city.160k.py --gpus 1
However, I obtain this error :
Traceback (most recent call last):
File "tools/train.py", line 166, in <module>
main()
File "tools/train.py", line 155, in main
train_segmentor(
File "/gpfs_new/scratch/users/agolebiewski/SegFormer/mmseg/apis/train.py", line 115, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/data/users/agolebiewski/conda-envs/segformer/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], **kwargs)
File "/data/users/agolebiewski/conda-envs/segformer/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/data/users/agolebiewski/conda-envs/segformer/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/gpfs_new/scratch/users/agolebiewski/SegFormer/mmseg/models/segmentors/base.py", line 153, in train_step
loss, log_vars = self._parse_losses(losses)
File "/gpfs_new/scratch/users/agolebiewski/SegFormer/mmseg/models/segmentors/base.py", line 204, in _parse_losses
log_vars[loss_name] = loss_value.item()
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1595629395347/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7fa9dae8377d in /data/users/agolebiewski/conda-envs/segformer/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xb5d (0x7fa9db0d3d9d in /data/users/agolebiewski/conda-envs/segformer/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fa9dae6fb1d in /data/users/agolebiewski/conda-envs/segformer/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x53956b (0x7faa189e156b in /data/users/agolebiewski/conda-envs/segformer/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #21: __libc_start_main + 0xf5 (0x7faa45c2a555 in /lib64/libc.so.6)
Aborted
I attempted to follow the others issues comments.
But I always obtain this CUDA error: an illegal memory access was encountered ...
Hello,
I want to run the segmentation training code for cityscapes as follow with 1 GPU :
python tools/train.py local_configs/segformer/B1/segformer.b1.1024x1024.city.160k.py --gpus 1
However, I obtain this error :
I attempted to follow the others issues comments. But I always obtain this
CUDA error: an illegal memory access was encountered
...