DCN on Jetson TX2 - Githubissues

MauroPfister commented 4 years ago

Hi

I am trying to use the deformable convolutions from this repo on a Jetson TX2. Compilation was successful and I can also run them from Python. However, for every call of the DCN I get the following error: error in deformable_im2col: too many resources requested for launch

I was wondering if there are any setting in the .cu files that I can change to fix this error?

Minimal reproducible example

# Execute from parent directory of ops folder

import torch
from ops.dcn import DeformConvPack

device = torch.device('cuda')
dcn = DeformConvPack(in_channels=256,
                     out_channels=256,
                     kernel_size=(3, 3),
                     padding=1).to(device)
input = torch.Tensor(16, 256, 26, 20).to(device)
output = dcn(input)

Environment

Jetson TX2 with JetPack 4.3
Python 3.6.9
Pytorch 1.4
Torchvision 0.5

Since I only wanted to install DCNs instead of the whole repo, I used a reduced setup.py (copied from this repo):

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
    name='mmdet',
    ext_modules=[
        CUDAExtension('deform_conv_cuda', [
            'src/deform_conv_cuda.cpp',
            'src/deform_conv_cuda_kernel.cu'
        ]),
        CUDAExtension('deform_pool_cuda', [
            'src/deform_pool_cuda.cpp',
            'src/deform_pool_cuda_kernel.cu'
        ])
    ],
    cmdclass={
        'build_ext': BuildExtension
    })

Bug fix After a quick search on Google I found this PyTorch issue which seems related. Unfortunately I have no experience with CUDA at all, so I am not sure if this helps.

MauroPfister commented 4 years ago

I was able to solve the issue by replacing CUDA_NUM_THREADS = 1024 by CUDA_NUM_THREADS = 512 and recompiling: https://github.com/open-mmlab/mmdetection/blob/2b6f6616f804beaca3dbf071fa398c586243db13/mmdet/ops/dcn/src/cuda/deform_conv_cuda_kernel.cu#L76

The regular convolutions of PyTorch do not seem to have this problem. Maybe the CUDA_NUM_THREADS constant could be set depending for which architecture the DCNs are built?

hellock commented 4 years ago

Thanks for your reporting! It is a known issue that setting CUDA_NUM_THREADS to 1024 causes the building failure on some old or lightweight devices. We have not found a good way to set it according to the gpu arch. PRs are welcome if you have any ideas.

MauroPfister commented 4 years ago

I don't have any experience with PyTorch CUDA extensions, so I can't help with a PR unfortunately. But maybe just mention it in a README somewhere? That way people could easily fix the issue themselves.

jshilong commented 2 years ago

Thanks for your reporting! I would add it to FAQ to help people locate problems faster.

open-mmlab / mmdetection

DCN on Jetson TX2 #3041