jiaweihe1996 / GMTracker

Official PyTorch implementation of "Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking" (CVPR 2021).
GNU General Public License v3.0
112 stars 23 forks source link

RuntimeError: CUDA error: invalid configuration argument (with python trainGMMOT.py) #13

Closed 1359347500cwc closed 2 years ago

1359347500cwc commented 2 years ago

When I train the model on GTX TITAN 12G CUDA10.2 it report the RuntimeError: CUDA error: invalid configuration argument

Here is the full Traceback

(GMTracker) cwc@imc-Z9PE-D8-WS:~/GMTracker$ python trainGMMOT.py MOT17-04 210 /home/cwc/anaconda3/envs/GMTracker/lib/python3.6/site-packages/torch_geometric/deprecation.py:13: UserWarning: 'data.DataLoader' is deprecated, use 'loader.DataLoader' instead warnings.warn(out) Start training... Epoch 0/1 lr = 1.00e-05

Traceback (most recent call last): File "trainGMMOT.py", line 183, in scheduler File "trainGMMOT.py", line 104, in train_model loss.backward() File "/home/cwc/anaconda3/envs/GMTracker/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/cwc/anaconda3/envs/GMTracker/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward allow_unreachable=True) # allow_unreachable flag File "/home/cwc/anaconda3/envs/GMTracker/lib/python3.6/site-packages/torch/autograd/function.py", line 77, in apply return self._forward_cls.backward(self, args) File "/home/cwc/GMTracker/qpth/qpth/qp.py", line 144, in backward ctx.Q_LU, ctx.S_LU, ctx.R = pdipm_b.pre_factor_kkt(Q, G, A) File "/home/cwc/GMTracker/qpth/qpth/solvers/pdipm/batch.py", line 395, in pre_factor_kkt G_invQ_GT = torch.bmm(G, G.transpose(1, 2).lu_solve(Q_LU)) RuntimeError: CUDA error: invalid configuration argument

jiaweihe1996 commented 2 years ago

What is your pytorch version? According to my experiments, pytorch 1.4.0 should report a warning about large matrix in MAGMA instead of a CUDA error. Maybe in the higher version of pytorch, there is a problem. Refer to https://github.com/locuslab/qpth/issues/37, https://github.com/pytorch/pytorch/pull/61815. A practical solution is using pytorch 1.4.0, or using the pytorch after the commit 6d21e36f210b5e377941c98568099c819aaaea01.

1359347500cwc commented 2 years ago

my pytorch version is 1.4.0 but it also report this error.

jiaweihe1996 commented 2 years ago

How about referring to https://github.com/pytorch/pytorch/pull/61815?