Open saniazahan opened 4 years ago
I am getting this error total_feat_size 800 [09.25.20|00:13:48] Parameters: {'work_dir': 'work_dir', 'config': 'config/st_gcn/ntu-xview/train.yaml', 'phase': 'train', 'save_result': False, 'start_epoch': 0, 'num_epoch': 80, 'use_gpu': True, 'device': [0], 'log_interval': 100, 'save_interval': 10, 'eval_interval': 5, 'save_log': True, 'print_log': True, 'pavi_log': False, 'feeder': 'feeder.feeder.Feeder', 'num_worker': 4, 'train_feeder_args': {'data_path': './data/NTU-RGB-D/xview/train_data.npy', 'label_path': './data/NTU-RGB-D/xview/train_label.pkl', 'debug': False}, 'test_feeder_args': {'data_path': './data/NTU-RGB-D/xview/val_data.npy', 'label_path': './data/NTU-RGB-D/xview/val_label.pkl'}, 'batch_size': 64, 'test_batch_size': 64, 'debug': False, 'model': 'net.st_gcn.Model', 'model_args': {'in_channels': 3, 'num_class': 60, 'dropout': 0.5, 'edge_importance_weighting': True, 'graph_args': {'layout': 'ntu-rgb+d', 'strategy': 'spatial'}}, 'weights': None, 'ignore_weights': [], 'show_topk': [1, 5], 'base_lr': 0.1, 'step': [10, 50], 'optimizer': 'SGD', 'nesterov': True, 'weight_decay': 0.0001}
[09.25.20|00:13:48] Training epoch: 0
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [20,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [24,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [27,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes
failed.
Traceback (most recent call last):
File "main.py", line 31, in cublasCreate(handle)
Exception raised from createCublasHandle at /opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/ATen/cuda/CublasHandlePool.cpp:8 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f56c6ecd77d in /home/uniwa/students3/students/22905553/linux/anaconda3/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1:
It maybe you packing the *.skeleton files in to npy via 120 classes rather than 60 classes. This is my case.
Hi it seems you code for training the model uses 4 gpus but I have only one. I tried using one but it shows error while calculating back propagation. Is there a to train on a single gpu device. Thanks