kchengiva / DecoupleGCN-DropGraph

The implementation for "Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition" (ECCV2020).
Other
101 stars 19 forks source link

RuntimeError: copy_if failed to synchronize: device-side assert triggered #5

Open erinchen824 opened 3 years ago

erinchen824 commented 3 years ago

Hi, there, I run into this problem while using a modified Adjacent matrix (share the same dimension with original A, nothing is changed except the content of A). Could you please help me out? Thanks!

/opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/native/cuda/Distributions.cu:290: lambda [](int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto::operator()(int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto: block: [0,0,0], thread: [79,0,0] Assertion 0 <= p4 && p4 <= 1 failed. 54%|██████████████████████████████████████████████████████████████████████████████████████████▏ | 34/63 [00:38<00:32, 1.12s/it] Traceback (most recent call last): File "main-tf.py", line 671, in processor.start() File "main-tf.py", line 609, in start self.train(epoch, save_model=save_model) File "main-tf.py", line 427, in train output = self.model(data, keep_prob) File "/home/user/anaconda3/envs/dropgraph/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, kwargs) File "/home/user/CYJ/Skeleton-AR/DecoupleGCN-DropGraph-master/model/decouple_gcn_hp_regulation.py", line 283, in forward x = self.l7(x, keep_prob) File "/home/user/anaconda3/envs/dropgraph/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, *kwargs) File "/home/user/CYJ/Skeleton-AR/DecoupleGCN-DropGraph-master/model/decouple_gcn_hp_regulation.py", line 223, in forward x = self.tcn1(self.gcn1(x), keep_prob, self.A) + self.dropT_skip( File "/home/user/anaconda3/envs/dropgraph/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, kwargs) File "/home/user/CYJ/Skeleton-AR/DecoupleGCN-DropGraph-master/model/decouple_gcn_hp_regulation.py", line 56, in forward x = self.dropT(self.dropS(x, keep_prob, A), keep_prob) File "/home/user/anaconda3/envs/dropgraph/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, **kwargs) File "/home/user/CYJ/Skeleton-AR/DecoupleGCN-DropGraph-master/model/dropSke.py", line 35, in forward M[M > 0.001] = 1.0 RuntimeError: copy_if failed to synchronize: device-side assert triggered

erinchen824 commented 3 years ago

Hi, there, I am running into another problem, and it seems the dimension and datatype of M_seed and A is right while this error occured:

/opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/native/cuda/Distributions.cu:290: lambda [](int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto::operator()(int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto: block: [0,0,0], thread: [156,0,0] Assertion 0 <= p4 && p4 <= 1 failed. /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/native/cuda/Distributions.cu:290: lambda [](int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto::operator()(int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto: block: [0,0,0], thread: [157,0,0] Assertion 0 <= p4 && p4 <= 1 failed. /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/native/cuda/Distributions.cu:290: lambda [](int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto::operator()(int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto: block: [0,0,0], thread: [158,0,0] Assertion 0 <= p4 && p4 <= 1 failed. /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/native/cuda/Distributions.cu:290: lambda [](int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto::operator()(int, float &, float &, float &, float &, const float &, const float &, const float &, const float &)->auto: block: [0,0,0], thread: [159,0,0] Assertion 0 <= p4 && p4 <= 1 failed. torch.Size([32, 25]) torch.Size([25, 25]) 83%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 5/6 [00:07<00:01, 1.49s/it] Traceback (most recent call last): File "main-tf.py", line 700, in processor.start() File "main-tf.py", line 629, in start self.train(epoch, save_model=save_model) File "main-tf.py", line 440, in train output = self.model(data, keep_prob) File "/home/user/anaconda3/envs/dropgraph/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, kwargs) File "/home/user/CYJ/Skeleton-AR/DecoupleGCN-DropGraph-master/model/decouple_gcn_hp_regulation.py", line 295, in forward x = self.l7(x, keep_prob) File "/home/user/anaconda3/envs/dropgraph/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, *kwargs) File "/home/user/CYJ/Skeleton-AR/DecoupleGCN-DropGraph-master/model/decouple_gcn_hp_regulation.py", line 234, in forward x = self.tcn1(self.gcn1(x), keep_prob, self.A) + self.dropT_skip( File "/home/user/anaconda3/envs/dropgraph/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, kwargs) File "/home/user/CYJ/Skeleton-AR/DecoupleGCN-DropGraph-master/model/decouple_gcn_hp_regulation.py", line 56, in forward x = self.dropT(self.dropS(x, keep_prob, A), keep_prob) File "/home/user/anaconda3/envs/dropgraph/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, **kwargs) File "/home/user/CYJ/Skeleton-AR/DecoupleGCN-DropGraph-master/model/dropSke.py", line 34, in forward M = torch.matmul(M_seed, A) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

Could you please help me with this one?Thanks!