Closed nyanmn closed 2 years ago
@gengenkai, plz check this issue
terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered
Please use ’CUDA_LAUNCH_BLOCKING=1 python tools/train.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py --work-dir work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d --validate --seed 0 --deterministic‘ to localize the error more precisely. Usually, this error indicates there is indexes out of boundary.
The command is changed to
CUDA_LAUNCH_BLOCKING=1 python tools/train.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py --work-dir work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d --validate --seed 0 --deterministic
The errors are
2022-02-17 15:16:24,803 - mmaction - INFO - workflow: [('train', 1)], max: 80 epochs
2022-02-17 15:16:24,803 - mmaction - INFO - Checkpoints will be saved to /home/sysadmin/Nyan/mmaction2/work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d by HardDiskBackend.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=115 error=710 : device-side assert triggered
Traceback (most recent call last):
File "tools/train.py", line 205, in <module>
main()
File "tools/train.py", line 201, in main
meta=meta)
File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step
losses = self(skeletons, label, return_loss=True)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward
return self.forward_train(keypoint, label, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train
loss = self.cls_head.loss(output, gt_labels)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss
loss_cls = self.loss_cls(cls_score, labels, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 38, in forward
ret = self._forward(*args, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/cross_entropy_loss.py", line 81, in _forward
loss_cls = F.cross_entropy(cls_score, label, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 2468, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 2264, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:115
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f69084548b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f69086a6982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f690843fb7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5fbb7a (0x7f694578eb7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5fbc26 (0x7f694578ec26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #23: __libc_start_main + 0xf5 (0x7f69703803d5 in /lib64/libc.so.6)
Aborted (core dumped)
@gengenkai any progress?
The command is changed to
CUDA_LAUNCH_BLOCKING=1 python tools/train.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py --work-dir work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d --validate --seed 0 --deterministic
The errors are
2022-02-17 15:16:24,803 - mmaction - INFO - workflow: [('train', 1)], max: 80 epochs 2022-02-17 15:16:24,803 - mmaction - INFO - Checkpoints will be saved to /home/sysadmin/Nyan/mmaction2/work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d by HardDiskBackend. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed. THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=115 error=710 : device-side assert triggered Traceback (most recent call last): File "tools/train.py", line 205, in <module> main() File "tools/train.py", line 201, in main meta=meta) File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step losses = self(skeletons, label, return_loss=True) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward return self.forward_train(keypoint, label, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train loss = self.cls_head.loss(output, gt_labels) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss loss_cls = self.loss_cls(cls_score, labels, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 38, in forward ret = self._forward(*args, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/cross_entropy_loss.py", line 81, in _forward loss_cls = F.cross_entropy(cls_score, label, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 2468, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 2264, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: cuda runtime error (710) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:115 terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f69084548b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f69086a6982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f690843fb7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #3: <unknown function> + 0x5fbb7a (0x7f694578eb7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #4: <unknown function> + 0x5fbc26 (0x7f694578ec26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) <omitting python frames> frame #23: __libc_start_main + 0xf5 (0x7f69703803d5 in /lib64/libc.so.6) Aborted (core dumped)
Hi, I have tried this config and the training process did not occur this bug. Maybe you could check the label of your data and make sure the data label is less than the total class of your data.
@gengenkai any progress?
Have tested it, no errors.
Wired! I have keep on getting same errors. This time tested has the following errors.
2022-02-19 13:44:14,937 - mmaction - INFO - workflow: [('train', 1)], max: 80 epochs
2022-02-19 13:44:14,937 - mmaction - INFO - Checkpoints will be saved to /home/sysadmin/Nyan/mmaction2/work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d by HardDiskBackend.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
File "./tools/train.py", line 205, in <module>
main()
File "./tools/train.py", line 201, in main
meta=meta)
File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
Traceback (most recent call last):
Traceback (most recent call last):
File "./tools/train.py", line 205, in <module>
File "./tools/train.py", line 205, in <module>
main()
File "./tools/train.py", line 201, in main
main()
File "./tools/train.py", line 201, in main
meta=meta)
File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model
meta=meta)
File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
epoch_runner(data_loaders[i], **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
**kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])output = self.module.train_step(*inputs[0], **kwargs[0])
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step
losses = self(skeletons, label, return_loss=True)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
losses = self(skeletons, label, return_loss=True)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
losses = self(skeletons, label, return_loss=True)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward
return self.forward_train(keypoint, label, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward
return self.forward_train(keypoint, label, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train
loss = self.cls_head.loss(output, gt_labels)
loss = self.cls_head.loss(output, gt_labels)return self.forward_train(keypoint, label, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss
loss = self.cls_head.loss(output, gt_labels)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss
Traceback (most recent call last):
File "./tools/train.py", line 205, in <module>
main()
File "./tools/train.py", line 201, in main
meta=meta)
File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step
losses = self(skeletons, label, return_loss=True)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward
return self.forward_train(keypoint, label, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train
loss = self.cls_head.loss(output, gt_labels)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss
loss_cls = self.loss_cls(cls_score, labels, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
loss_cls = self.loss_cls(cls_score, labels, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
loss_cls = self.loss_cls(cls_score, labels, **kwargs) loss_cls = self.loss_cls(cls_score, labels, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward
ret *= self.loss_weight
RuntimeError : CUDA error: device-side assert triggered
ret *= self.loss_weightret *= self.loss_weightret *= self.loss_weight
RuntimeError
RuntimeError: CUDA error: device-side assert triggered: CUDA error: device-side assert triggered
RuntimeError
: CUDA error: device-side assert triggered
Traceback (most recent call last):
File "./tools/train.py", line 205, in <module>
main()
File "./tools/train.py", line 201, in main
meta=meta)
File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step
losses = self(skeletons, label, return_loss=True)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward
return self.forward_train(keypoint, label, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train
loss = self.cls_head.loss(output, gt_labels)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss
loss_cls = self.loss_cls(cls_score, labels, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward
ret *= self.loss_weight
RuntimeError: CUDA error: device-side assert triggered
Traceback (most recent call last):
File "./tools/train.py", line 205, in <module>
main()
File "./tools/train.py", line 201, in main
meta=meta)
File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 154, in train_step
loss, log_vars = self._parse_losses(losses)
File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 97, in _parse_losses
log_vars[loss_name] = loss_value.item()
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2f1dfd68b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f2f1e228982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f2f1dfc1b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5fbb7a (0x7f2f5b310b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5fbc26 (0x7f2f5b310c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x1817da (0x5624dd9737da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #6: <unknown function> + 0xfbfa9 (0x5624dd8edfa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #7: <unknown function> + 0xfa8c8 (0x5624dd8ec8c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #8: <unknown function> + 0xfa8c8 (0x5624dd8ec8c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #9: <unknown function> + 0xfa2d8 (0x5624dd8ec2d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #10: <unknown function> + 0xfad68 (0x5624dd8ecd68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #11: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #12: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #13: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #14: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #15: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #16: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #17: <unknown function> + 0x12b327 (0x5624dd91d327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #18: PyDict_SetItemString + 0x89 (0x5624dd929e59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0xab (0x5624dd99ed0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x64 (0x5624dda13304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #21: <unknown function> + 0x232960 (0x5624dda24960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #22: _Py_UnixMain + 0x3c (0x5624dda24ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #23: __libc_start_main + 0xf5 (0x7f2f85f023d5 in /lib64/libc.so.6)
frame #24: <unknown function> + 0x1d7555 (0x5624dd9c9555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fc22fd528b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7fc22ffa4982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fc22fd3db7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5fbb7a (0x7fc26d08cb7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5fbc26 (0x7fc26d08cc26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x1817da (0x55bb791577da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #6: <unknown function> + 0xfbfa9 (0x55bb790d1fa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #7: <unknown function> + 0xfa8c8 (0x55bb790d08c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #8: <unknown function> + 0xfa8c8 (0x55bb790d08c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #9: <unknown function> + 0xfa2d8 (0x55bb790d02d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #10: <unknown function> + 0xfad68 (0x55bb790d0d68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #11: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #12: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #13: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #14: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #15: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #16: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #17: <unknown function> + 0x12b327 (0x55bb79101327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #18: PyDict_SetItemString + 0x89 (0x55bb7910de59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0xab (0x55bb79182d0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x64 (0x55bb791f7304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #21: <unknown function> + 0x232960 (0x55bb79208960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #22: _Py_UnixMain + 0x3c (0x55bb79208ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #23: __libc_start_main + 0xf5 (0x7fc297c7e3d5 in /lib64/libc.so.6)
frame #24: <unknown function> + 0x1d7555 (0x55bb791ad555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f652ef1e8b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f652f170982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f652ef09b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5fbb7a (0x7f656c258b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5fbc26 (0x7f656c258c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x1817da (0x56253327f7da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #6: <unknown function> + 0xfbfa9 (0x5625331f9fa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #7: <unknown function> + 0xfa8c8 (0x5625331f88c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #8: <unknown function> + 0xfa8c8 (0x5625331f88c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #9: <unknown function> + 0xfa2d8 (0x5625331f82d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #10: <unknown function> + 0xfad68 (0x5625331f8d68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #11: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #12: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #13: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #14: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #15: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #16: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #17: <unknown function> + 0x12b327 (0x562533229327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #18: PyDict_SetItemString + 0x89 (0x562533235e59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0xab (0x5625332aad0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x64 (0x56253331f304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #21: <unknown function> + 0x232960 (0x562533330960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #22: _Py_UnixMain + 0x3c (0x562533330ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #23: __libc_start_main + 0xf5 (0x7f6596e4a3d5 in /lib64/libc.so.6)
frame #24: <unknown function> + 0x1d7555 (0x5625332d5555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f716a00d8b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f716a25f982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f7169ff8b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5fbb7a (0x7f71a7347b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5fbc26 (0x7f71a7347c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x1817da (0x556ac77bd7da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #6: <unknown function> + 0xfbfa9 (0x556ac7737fa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #7: <unknown function> + 0xfa8c8 (0x556ac77368c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #8: <unknown function> + 0xfa8c8 (0x556ac77368c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #9: <unknown function> + 0xfa2d8 (0x556ac77362d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #10: <unknown function> + 0xfad68 (0x556ac7736d68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #11: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #12: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #13: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #14: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #15: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #16: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #17: <unknown function> + 0x12b327 (0x556ac7767327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #18: PyDict_SetItemString + 0x89 (0x556ac7773e59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0xab (0x556ac77e8d0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x64 (0x556ac785d304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #21: <unknown function> + 0x232960 (0x556ac786e960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #22: _Py_UnixMain + 0x3c (0x556ac786eccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #23: __libc_start_main + 0xf5 (0x7f71d1f393d5 in /lib64/libc.so.6)
frame #24: <unknown function> + 0x1d7555 (0x556ac7813555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f611a0768b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f611a2c8982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f611a061b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5fbb7a (0x7f61573b0b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5fbc26 (0x7f61573b0c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x1817da (0x557f69ef57da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #6: <unknown function> + 0xfbfa9 (0x557f69e6ffa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #7: <unknown function> + 0xfa8c8 (0x557f69e6e8c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #8: <unknown function> + 0xfa8c8 (0x557f69e6e8c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #9: <unknown function> + 0xfa2d8 (0x557f69e6e2d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #10: <unknown function> + 0xfad68 (0x557f69e6ed68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #11: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #12: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #13: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #14: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #15: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #16: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #17: <unknown function> + 0x12b327 (0x557f69e9f327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #18: PyDict_SetItemString + 0x89 (0x557f69eabe59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0xab (0x557f69f20d0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x64 (0x557f69f95304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #21: <unknown function> + 0x232960 (0x557f69fa6960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #22: _Py_UnixMain + 0x3c (0x557f69fa6ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #23: __libc_start_main + 0xf5 (0x7f6181fa23d5 in /lib64/libc.so.6)
frame #24: <unknown function> + 0x1d7555 (0x557f69f4b555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f52f3b1b8b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f52f3d6d982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f52f3b06b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5fbb7a (0x7f5330e55b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x5fbc26 (0x7f5330e55c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x1817da (0x55e644b6f7da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #6: <unknown function> + 0xfa2d8 (0x55e644ae82d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #7: <unknown function> + 0xfad68 (0x55e644ae8d68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #8: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #9: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #10: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #11: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #12: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #13: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #14: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #15: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #16: <unknown function> + 0x12b327 (0x55e644b19327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #17: PyDict_SetItemString + 0x89 (0x55e644b25e59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #18: PyImport_Cleanup + 0xab (0x55e644b9ad0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #19: Py_FinalizeEx + 0x64 (0x55e644c0f304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #20: <unknown function> + 0x232960 (0x55e644c20960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #21: _Py_UnixMain + 0x3c (0x55e644c20ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
frame #22: __libc_start_main + 0xf5 (0x7f535ba473d5 in /lib64/libc.so.6)
frame #23: <unknown function> + 0x1d7555 (0x55e644bc5555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python)
Traceback (most recent call last):
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
main()
File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/sysadmin/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=7', 'configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py', '--launcher', 'pytorch', '--validate', '--seed', '0', '--deterministic']' died with <Signals.SIGABRT: 6>.
(open-mmlab) [sysadmin@traininglab mmaction2]$
Wired! I have keep on getting same errors. This time tested has the following errors.
2022-02-19 13:44:14,937 - mmaction - INFO - workflow: [('train', 1)], max: 80 epochs 2022-02-19 13:44:14,937 - mmaction - INFO - Checkpoints will be saved to /home/sysadmin/Nyan/mmaction2/work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d by HardDiskBackend. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed. /opt/conda/conda-bld/pytorch_1603729006826/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed. Traceback (most recent call last): File "./tools/train.py", line 205, in <module> main() File "./tools/train.py", line 201, in main meta=meta) File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step Traceback (most recent call last): Traceback (most recent call last): File "./tools/train.py", line 205, in <module> File "./tools/train.py", line 205, in <module> main() File "./tools/train.py", line 201, in main main() File "./tools/train.py", line 201, in main meta=meta) File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model meta=meta) File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train epoch_runner(data_loaders[i], **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step output = self.module.train_step(*inputs[0], **kwargs[0])output = self.module.train_step(*inputs[0], **kwargs[0]) output = self.module.train_step(*inputs[0], **kwargs[0]) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step losses = self(skeletons, label, return_loss=True) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl losses = self(skeletons, label, return_loss=True) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl losses = self(skeletons, label, return_loss=True) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward return self.forward_train(keypoint, label, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward return self.forward_train(keypoint, label, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train loss = self.cls_head.loss(output, gt_labels) loss = self.cls_head.loss(output, gt_labels)return self.forward_train(keypoint, label, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss loss = self.cls_head.loss(output, gt_labels) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss Traceback (most recent call last): File "./tools/train.py", line 205, in <module> main() File "./tools/train.py", line 201, in main meta=meta) File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step output = self.module.train_step(*inputs[0], **kwargs[0]) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step losses = self(skeletons, label, return_loss=True) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward return self.forward_train(keypoint, label, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train loss = self.cls_head.loss(output, gt_labels) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss loss_cls = self.loss_cls(cls_score, labels, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl loss_cls = self.loss_cls(cls_score, labels, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl loss_cls = self.loss_cls(cls_score, labels, **kwargs) loss_cls = self.loss_cls(cls_score, labels, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward ret *= self.loss_weight RuntimeError : CUDA error: device-side assert triggered ret *= self.loss_weightret *= self.loss_weightret *= self.loss_weight RuntimeError RuntimeError: CUDA error: device-side assert triggered: CUDA error: device-side assert triggered RuntimeError : CUDA error: device-side assert triggered Traceback (most recent call last): File "./tools/train.py", line 205, in <module> main() File "./tools/train.py", line 201, in main meta=meta) File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step output = self.module.train_step(*inputs[0], **kwargs[0]) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 152, in train_step losses = self(skeletons, label, return_loss=True) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 106, in forward return self.forward_train(keypoint, label, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/skeletongcn.py", line 18, in forward_train loss = self.cls_head.loss(output, gt_labels) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/heads/base.py", line 102, in loss loss_cls = self.loss_cls(cls_score, labels, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/losses/base.py", line 44, in forward ret *= self.loss_weight RuntimeError: CUDA error: device-side assert triggered Traceback (most recent call last): File "./tools/train.py", line 205, in <module> main() File "./tools/train.py", line 201, in main meta=meta) File "/home/sysadmin/Nyan/mmaction2/mmaction/apis/train.py", line 204, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step output = self.module.train_step(*inputs[0], **kwargs[0]) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 154, in train_step loss, log_vars = self._parse_losses(losses) File "/home/sysadmin/Nyan/mmaction2/mmaction/models/skeleton_gcn/base.py", line 97, in _parse_losses log_vars[loss_name] = loss_value.item() RuntimeError: CUDA error: device-side assert triggered terminate called after throwing an instance of 'c10::Error' terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2f1dfd68b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f2f1e228982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f2f1dfc1b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #3: <unknown function> + 0x5fbb7a (0x7f2f5b310b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #4: <unknown function> + 0x5fbc26 (0x7f2f5b310c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #5: <unknown function> + 0x1817da (0x5624dd9737da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #6: <unknown function> + 0xfbfa9 (0x5624dd8edfa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #7: <unknown function> + 0xfa8c8 (0x5624dd8ec8c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #8: <unknown function> + 0xfa8c8 (0x5624dd8ec8c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #9: <unknown function> + 0xfa2d8 (0x5624dd8ec2d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #10: <unknown function> + 0xfad68 (0x5624dd8ecd68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #11: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #12: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #13: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #14: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #15: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #16: <unknown function> + 0xfad7c (0x5624dd8ecd7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #17: <unknown function> + 0x12b327 (0x5624dd91d327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #18: PyDict_SetItemString + 0x89 (0x5624dd929e59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #19: PyImport_Cleanup + 0xab (0x5624dd99ed0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #20: Py_FinalizeEx + 0x64 (0x5624dda13304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #21: <unknown function> + 0x232960 (0x5624dda24960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #22: _Py_UnixMain + 0x3c (0x5624dda24ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #23: __libc_start_main + 0xf5 (0x7f2f85f023d5 in /lib64/libc.so.6) frame #24: <unknown function> + 0x1d7555 (0x5624dd9c9555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) what(): CUDA error: device-side assert triggered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fc22fd528b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7fc22ffa4982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fc22fd3db7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #3: <unknown function> + 0x5fbb7a (0x7fc26d08cb7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #4: <unknown function> + 0x5fbc26 (0x7fc26d08cc26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #5: <unknown function> + 0x1817da (0x55bb791577da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #6: <unknown function> + 0xfbfa9 (0x55bb790d1fa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #7: <unknown function> + 0xfa8c8 (0x55bb790d08c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #8: <unknown function> + 0xfa8c8 (0x55bb790d08c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #9: <unknown function> + 0xfa2d8 (0x55bb790d02d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #10: <unknown function> + 0xfad68 (0x55bb790d0d68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #11: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #12: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #13: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #14: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #15: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #16: <unknown function> + 0xfad7c (0x55bb790d0d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #17: <unknown function> + 0x12b327 (0x55bb79101327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #18: PyDict_SetItemString + 0x89 (0x55bb7910de59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #19: PyImport_Cleanup + 0xab (0x55bb79182d0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #20: Py_FinalizeEx + 0x64 (0x55bb791f7304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #21: <unknown function> + 0x232960 (0x55bb79208960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #22: _Py_UnixMain + 0x3c (0x55bb79208ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #23: __libc_start_main + 0xf5 (0x7fc297c7e3d5 in /lib64/libc.so.6) frame #24: <unknown function> + 0x1d7555 (0x55bb791ad555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f652ef1e8b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f652f170982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f652ef09b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #3: <unknown function> + 0x5fbb7a (0x7f656c258b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #4: <unknown function> + 0x5fbc26 (0x7f656c258c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #5: <unknown function> + 0x1817da (0x56253327f7da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #6: <unknown function> + 0xfbfa9 (0x5625331f9fa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #7: <unknown function> + 0xfa8c8 (0x5625331f88c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #8: <unknown function> + 0xfa8c8 (0x5625331f88c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #9: <unknown function> + 0xfa2d8 (0x5625331f82d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #10: <unknown function> + 0xfad68 (0x5625331f8d68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #11: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #12: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #13: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #14: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #15: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #16: <unknown function> + 0xfad7c (0x5625331f8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #17: <unknown function> + 0x12b327 (0x562533229327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #18: PyDict_SetItemString + 0x89 (0x562533235e59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #19: PyImport_Cleanup + 0xab (0x5625332aad0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #20: Py_FinalizeEx + 0x64 (0x56253331f304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #21: <unknown function> + 0x232960 (0x562533330960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #22: _Py_UnixMain + 0x3c (0x562533330ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #23: __libc_start_main + 0xf5 (0x7f6596e4a3d5 in /lib64/libc.so.6) frame #24: <unknown function> + 0x1d7555 (0x5625332d5555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f716a00d8b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f716a25f982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f7169ff8b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #3: <unknown function> + 0x5fbb7a (0x7f71a7347b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #4: <unknown function> + 0x5fbc26 (0x7f71a7347c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #5: <unknown function> + 0x1817da (0x556ac77bd7da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #6: <unknown function> + 0xfbfa9 (0x556ac7737fa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #7: <unknown function> + 0xfa8c8 (0x556ac77368c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #8: <unknown function> + 0xfa8c8 (0x556ac77368c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #9: <unknown function> + 0xfa2d8 (0x556ac77362d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #10: <unknown function> + 0xfad68 (0x556ac7736d68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #11: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #12: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #13: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #14: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #15: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #16: <unknown function> + 0xfad7c (0x556ac7736d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #17: <unknown function> + 0x12b327 (0x556ac7767327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #18: PyDict_SetItemString + 0x89 (0x556ac7773e59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #19: PyImport_Cleanup + 0xab (0x556ac77e8d0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #20: Py_FinalizeEx + 0x64 (0x556ac785d304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #21: <unknown function> + 0x232960 (0x556ac786e960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #22: _Py_UnixMain + 0x3c (0x556ac786eccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #23: __libc_start_main + 0xf5 (0x7f71d1f393d5 in /lib64/libc.so.6) frame #24: <unknown function> + 0x1d7555 (0x556ac7813555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f611a0768b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f611a2c8982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f611a061b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #3: <unknown function> + 0x5fbb7a (0x7f61573b0b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #4: <unknown function> + 0x5fbc26 (0x7f61573b0c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #5: <unknown function> + 0x1817da (0x557f69ef57da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #6: <unknown function> + 0xfbfa9 (0x557f69e6ffa9 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #7: <unknown function> + 0xfa8c8 (0x557f69e6e8c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #8: <unknown function> + 0xfa8c8 (0x557f69e6e8c8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #9: <unknown function> + 0xfa2d8 (0x557f69e6e2d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #10: <unknown function> + 0xfad68 (0x557f69e6ed68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #11: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #12: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #13: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #14: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #15: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #16: <unknown function> + 0xfad7c (0x557f69e6ed7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #17: <unknown function> + 0x12b327 (0x557f69e9f327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #18: PyDict_SetItemString + 0x89 (0x557f69eabe59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #19: PyImport_Cleanup + 0xab (0x557f69f20d0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #20: Py_FinalizeEx + 0x64 (0x557f69f95304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #21: <unknown function> + 0x232960 (0x557f69fa6960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #22: _Py_UnixMain + 0x3c (0x557f69fa6ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #23: __libc_start_main + 0xf5 (0x7f6181fa23d5 in /lib64/libc.so.6) frame #24: <unknown function> + 0x1d7555 (0x557f69f4b555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1603729006826/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f52f3b1b8b2 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7f52f3d6d982 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f52f3b06b7d in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #3: <unknown function> + 0x5fbb7a (0x7f5330e55b7a in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #4: <unknown function> + 0x5fbc26 (0x7f5330e55c26 in /home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #5: <unknown function> + 0x1817da (0x55e644b6f7da in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #6: <unknown function> + 0xfa2d8 (0x55e644ae82d8 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #7: <unknown function> + 0xfad68 (0x55e644ae8d68 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #8: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #9: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #10: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #11: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #12: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #13: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #14: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #15: <unknown function> + 0xfad7c (0x55e644ae8d7c in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #16: <unknown function> + 0x12b327 (0x55e644b19327 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #17: PyDict_SetItemString + 0x89 (0x55e644b25e59 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #18: PyImport_Cleanup + 0xab (0x55e644b9ad0b in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #19: Py_FinalizeEx + 0x64 (0x55e644c0f304 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #20: <unknown function> + 0x232960 (0x55e644c20960 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #21: _Py_UnixMain + 0x3c (0x55e644c20ccc in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) frame #22: __libc_start_main + 0xf5 (0x7f535ba473d5 in /lib64/libc.so.6) frame #23: <unknown function> + 0x1d7555 (0x55e644bc5555 in /home/sysadmin/anaconda3/envs/open-mmlab/bin/python) Traceback (most recent call last): File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module> main() File "/home/sysadmin/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/sysadmin/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=7', 'configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py', '--launcher', 'pytorch', '--validate', '--seed', '0', '--deterministic']' died with <Signals.SIGABRT: 6>. (open-mmlab) [sysadmin@traininglab mmaction2]$
If you are using the ntu60 xsub for training, please first check the 'label' of your data (should be less than 60).
In this folder, there is only label file for ntu120 (label_map_ntu120.txt
).
In the configuration file, there is no path for label file.
I used this config file, num_classes=60
.
Where to check 'label' of data is less than 60.
In this folder, there is only label file for ntu120 (
label_map_ntu120.txt
). In the configuration file, there is no path for label file.
Please refer to our README.md in tools/data/skeleton. Use this command ' python gen_ntu_rgbd_raw.py --data-path your_raw_nturgbd60_skeleton_path --ignored-sample-path NTU_RGBD_samples_with_missing_skeletons.txt --out-folder your_nturgbd60_output_path --task ntu60' then you can get the correct data format for ntu60 dataset.
Yes I did that. Please see in my original post. It is reported first as
I am training 2s-agcn.
Raw skeleton data are downloaded from [here](https://github.com/shahroudy/NTURGB-D).
Converted to mmaction2 format using gen_ntu_rgbd_raw.py .
So have two foldersxsub and xviewafter conversion.
Used that command, python gen_ntu_rgbd_raw.py --data-path your_raw_nturgbd60_skeleton_path --ignored-sample-path NTU_RGBD_samples_with_missing_skeletons.txt --out-folder your_nturgbd60_output_path --task ntu60
. I have two folders after the process and each has train.pkl and val.pkl
I remember sth. I used all files inside
nturgbd_skeletons_s001_to_s017.zip
nturgbd_skeletons_s018_to_s032.zip
ntu60 is for the first one nturgbd_skeletons_s001_to_s017.zip ? Let me do again.
I remember sth. I used all files inside
nturgbd_skeletons_s001_to_s017.zip nturgbd_skeletons_s018_to_s032.zip
ntu60 is for the first one nturgbd_skeletons_s001_to_s017.zip ? Let me do again.
Yes. Ntu60 is the data from the first zip while ntu120 from both.
Yeah my bad. I used both. Now it works.
Yeah my bad. I used both. Now it works.
Good.
I am training 2s-agcn. Raw skeleton data are downloaded from here. Converted to mmaction2 format using
gen_ntu_rgbd_raw.py
. So have two foldersxsub and xview
after conversion.Then the follow command is used to train.
python tools/train.py configs/skeleton/2s-agcn/2sagcn_80e_ntu60_xsub_keypoint_3d.py --work-dir work_dirs/2sagcn_80e_ntu60_xsub_keypoint_3d --validate --seed 0 --deterministic
The whole errors are as follows. What could be wrong?