researchmm / TracKit

[ECCV'20] Ocean: Object-aware Anchor-Free Tracking
MIT License
610 stars 98 forks source link

Ocean 训练问题 #81

Open tm9161 opened 3 years ago

tm9161 commented 3 years ago

您好,请教个问题,我按照教程运行python tracking/onekey.py(单独运行train_ocean.py 错误一样)的时候遇到下面报错,不知道是什么问题?

Traceback (most recent call last): File "./tracking/train_ocean.py", line 259, in main() File "./tracking/train_ocean.py", line 250, in main model, writer_dict = ocean_train(train_loader, model, optimizer, epoch + 1, curLR, config, writer_dict, logger, device=device) File "/data/code/siam/TracKit/tracking/../lib/core/function.py", line 54, in ocean_train loss.backward() File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward Variable._execution_engine.run_backward( File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, args) # type: ignore File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 210, in wrapper outputs = fn(ctx, args) File "/data/code/siam/TracKit/tracking/../lib/models/dcn/deform_conv.py", line 85, in backward deform_conv_cuda.deform_conv_backward_parameters_cuda( RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

JudasDie commented 3 years ago

您好,请教个问题,我按照教程运行python tracking/onekey.py(单独运行train_ocean.py 错误一样)的时候遇到下面报错,不知道是什么问题?

Traceback (most recent call last): File "./tracking/train_ocean.py", line 259, in main() File "./tracking/train_ocean.py", line 250, in main model, writer_dict = ocean_train(train_loader, model, optimizer, epoch + 1, curLR, config, writer_dict, logger, device=device) File "/data/code/siam/TracKit/tracking/../lib/core/function.py", line 54, in ocean_train loss.backward() File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward Variable._execution_engine.run_backward( File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, args) # type: ignore File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 210, in wrapper outputs = fn(ctx, args) File "/data/code/siam/TracKit/tracking/../lib/models/dcn/deform_conv.py", line 85, in backward deform_conv_cuda.deform_conv_backward_parameters_cuda( RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

看起来是你没有成功编译deform conv. 检查下环境和install.sh是不是一样。或者去掉align训练没有align的。

tm9161 commented 3 years ago

您好,请教个问题,我按照教程运行python tracking/onekey.py(单独运行train_ocean.py 错误一样)的时候遇到下面报错,不知道是什么问题? Traceback (most recent call last): File "./tracking/train_ocean.py", line 259, in main() File "./tracking/train_ocean.py", line 250, in main model, writer_dict = ocean_train(train_loader, model, optimizer, epoch + 1, curLR, config, writer_dict, logger, device=device) File "/data/code/siam/TracKit/tracking/../lib/core/function.py", line 54, in ocean_train loss.backward() File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward Variable._execution_engine.run_backward( File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, args) # type: ignore File "/home/tm/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 210, in wrapper outputs = fn(ctx, args) File "/data/code/siam/TracKit/tracking/../lib/models/dcn/deform_conv.py", line 85, in backward deform_conv_cuda.deform_conv_backward_parameters_cuda( RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

看起来是你没有成功编译deform conv. 检查下环境和install.sh是不是一样。或者去掉align训练没有align的。

我重新编译了一下deform_conv 还是不行,看了一下install.sh里的配置,因为我用的30的卡,cuda是11.1,torch1.8.1,还有一个是mpi4py这个没安装成功,其他都按照install里安的,不知道有没有关系。

我自己再看看吧,谢谢回复。

Jee-King commented 3 years ago

@tm9161 您好 请问你解决这个问题了么,我也遇到同样的问题了

Jee-King commented 3 years ago

@JudasDie 您好,把align参数设置成False,就不会出现这个问题了。请问 这个参数会对性能有较大的影响吗?

tm9161 commented 3 years ago

@tm9161 您好 请问你解决这个问题了么,我也遇到同样的问题了

没,我也是设置了False。

l-sf commented 2 years ago

@tm9161 你好,我是3080ti + cudatoolkit11.1 + torch1.8 ,在python setup.py develop这一步编译就报错了,感觉是cuda版本太高的问题,请问你遇到这个问题了吗?怎么解决的?

JudasDie commented 2 years ago

Please refer to the new repo. of JudasDie/SOTS. Thx.

l-sf @.***> 于2022年7月12日周二 20:37写道:

@tm9161 https://github.com/tm9161 你好,我是3080ti + cudatoolkit11.1 + torch1.8 ,在python setup.py develop这一步编译就报错了,感觉是cuda版本太高的问题,请问你遇到这个问题了吗?怎么解决的?

— Reply to this email directly, view it on GitHub https://github.com/researchmm/TracKit/issues/81#issuecomment-1181708701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6U2PDIDVBCFSXBLHWR7VTVTVRKFANCNFSM46HFC5YA . You are receiving this because you were mentioned.Message ID: @.***>

-- From: Zhang Zhipeng Institution: National Laboratory of Pattern Recognition Address: 95 Zhongguancun East Road, 100190, BEIJING, CHINA Email: @.***

Best Wishes