yihongXU / TransCenter

This is the official implementation of TransCenter (TPAMI). The code and pretrained models are now available here: https://gitlab.inria.fr/yixu/TransCenter_official.
https://team.inria.fr/robotlearn/transcenter-transformers-with-dense-queriesfor-multiple-object-tracking/
Other
108 stars 7 forks source link

Does any of the provided images work with V100 GPUs? #8

Closed Atom-101 closed 2 years ago

Atom-101 commented 2 years ago

I am getting the following error in the deformable convolution package.

error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device Traceback (most recent call last): File "/opt/conda/lib/python3.7/pdb.py", line 1699, in main pdb._runscript(mainpyfile) File "/opt/conda/lib/python3.7/pdb.py", line 1568, in _runscript self.run(statement) File "/opt/conda/lib/python3.7/bdb.py", line 578, in run exec(cmd, globals, locals) File "", line 1, in File "/net/coxfs01/srv/export/coxfs01/pfister_lab2/share_root/Lab/abanerjee/VideoProjects/TransCenter_official/training/transcenter/main_mot17_tracking.py", line 32, in import os File "/net/coxfs01/srv/export/coxfs01/pfister_lab2/share_root/Lab/abanerjee/VideoProjects/TransCenter_official/training/transcenter/main_mot17_tracking.py", line 412, in main model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm, adaptive_clip=args.adaptive_clip) File "/net/coxfs01/srv/export/coxfs01/pfister_lab2/share_root/Lab/abanerjee/VideoProjects/TransCenter_official/training/transcenter/engine.py", line 65, in train_one_epoch outputs = model(samples, pre_samples=pre_samples, pre_hm=pre_hm) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, kwargs) File "/net/coxfs01/srv/export/coxfs01/pfister_lab2/share_root/Lab/abanerjee/VideoProjects/TransCenter_official/training/transcenter/models/deformable_detr.py", line 287, in forward hs[layer_lvl] = self.ida_up[0](hs[layer_lvl], 0, len(hs[layer_lvl]))[-1] File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "/net/coxfs01/srv/export/coxfs01/pfister_lab2/share_root/Lab/abanerjee/VideoProjects/TransCenter_official/training/transcenter/models/dla.py", line 98, in forward layers[startp] = node(layers[startp]) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, kwargs) File "/net/coxfs01/srv/export/coxfs01/pfister_lab2/share_root/Lab/abanerjee/VideoProjects/TransCenter_official/training/transcenter/models/dla.py", line 48, in forward x = self.conv(x) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/opt/DCNv2/dcn_v2.py", line 128, in forward self.deformable_groups) File "/opt/DCNv2/dcn_v2.py", line 31, in forward ctx.deformable_groups) RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1591914880026/work/aten/src/THC/THCBlas.cu:335

I am using the pytorch1-5cuda10-1_RTX.sif image with V100 gpu. Is there a way to make it work?

Thanks

yihongXU commented 2 years ago

Hi, Thanks for your interest in our project.

It seems that there is a compatibility problem with v100 GPU of the DCNv2 module compiled on RTX GPUs.

I suggest:

try pytorch1-5cuda10-1.sif https://drive.google.com/file/d/1MDNwMzJnculxEvEs3KN_rDE6XoYojx1H/view

or create your own env using Option 2 (compile DCNv2 on V100).