xingyizhou / CenterNet

Object detection, 3D detection, and pose estimation using center point detection:
MIT License
7.2k stars 1.92k forks source link

Jetson TX2 #387

Open oggyfaker opened 4 years ago

oggyfaker commented 4 years ago

Hi Xingyizhou ! i have a trouble with your realtime detection on my Jetson TX2. Because my TX2 just have ability to install CUDA library 9.0, and CUDNN have a lower version than your project. May be can you recommand me another way in order to use your code without using CUDA library 10.0 , thank you so much !

rabbitegg commented 4 years ago

Hi oggyfaker! I also run CenterNet on TX2, I also found some trouble. ERROR: too much resources requested for launch Have you met this problem? @oggyfaker thank you ~

lucky2046 commented 4 years ago

Any progress?

oggyfaker commented 4 years ago

Absolutely no. I have some trouble with pytorch version for Centernet in TX2. CenterNet works on CUDA 10, but TX2 just has maximum version for CUDA 9. i think we should convert the framework of Centernet by using C++ or another framework without using CUDA 10, or another way to use it on CUDA 9. Hope some people will solve this problem.

faheuer commented 4 years ago

Hi, I had the same issue on Jetson Nano, it is caused by Aarch64 chip architecture. You need to modify code in the cuda file (Should be src/lib/models/networks/DCNv2/src/cuda/dcn_v2_im2col_cuda.cu), correct:

const int CUDA_NUM_THREADS = 512;

Then compile DCNv2 again. For some reason Aarch64 cannot handle 1024.

abedMeto commented 4 years ago

Hi, I had the same issue on Jetson Nano, it is caused by Aarch64 chip architecture. You need to modify code in the cuda file (Should be src/lib/models/networks/DCNv2/src/cuda/dcn_v2_im2col_cuda.cu), correct:

const int CUDA_NUM_THREADS = 512;

Then compile DCNv2 again. For some reason Aarch64 cannot handle 1024.

Hi faheuer,

did you managed to run centernet on Jetson Nano? what pytorch version do you use?

faheuer commented 4 years ago

Hi abedMeto,

It works on Jetson nano. I believe it ran under version 0.4.1 or possibly 1.3. Too long ago..

abedMeto commented 4 years ago

Hi abedMeto,

It works on Jetson nano. I believe it ran under version 0.4.1 or possibly 1.3. Too long ago..

The next link is Nvidia's pre-built wheel installation for version 1.X https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-5-0-now-available/72048

did you installed it different way ? can you share please

faheuer commented 4 years ago

I believe I built it myself and it was Pytorch 0.4.1. I would suggest you to try 1.3 prebuilt and if that doesn't work try building yourself. It's certainly an older one since Centernet is already aged, maybe you can find the precise version in the how-to here. I unfortunately don't have my Jetson here currently to check.

jhnan-cs commented 4 years ago

Hi, faheuer, @faheuer I want to run CenterNet on Jetson TX2 with Pytorch 0.4.1. I also have the problem of runtime error. The error message is RuntimeError:cuda runtime error (7) : too many resources requested for launch at /home/nvidia/pytorch/aten/src/THC/THCTensorSort.cu:61. I use your method that modify code in the cuda file and compile DCNv2 again. Besides, I modify the source code of Pytorch and compile Pytorch again to limit the number of cuda threads.

THCTensorSort.cu: int64_t maxThreads = 512; #THCState_getCurrentDeviceProperties(state)->maxThreadsPerBlock;

But the above problem remains. Can you give me some advice? Thank you.

faheuer commented 4 years ago

Hi,

I checked my Jetson Nano and got you some Package Versions. I used Jetpack 4.3 (Cuda 10.0) and Pytorch 1.3. I am pretty sure that the "too many resources" error is linked to the THCState parameter though, make sure to completely recompile DCN after changing the value. Maybe an even lower value of threads is neccesary, however as the Jetson TX2 has more cores than the Nano that seems unlikely. In Windows, DCNv1 (see respective threat) worked instead of DCNv2, you can also try looking into that.

abedMeto commented 4 years ago

Hi,

Thanks faheuer for the valuable information. I was trying hard to run with Jetpack 4.4 and I had a feeling that something is wrong with it. I am now building a local version of pytorch 1.3.1. If I succeed I will share, If not, I will try JetPack 4.3

jhnan-cs when recompile DCNv2 you must delete all the files generated from previous compilation e.g 'build' and .egg and .so

abedMeto commented 4 years ago

Hi,

I checked my Jetson Nano and got you some Package Versions. I used Jetpack 4.3 (Cuda 10.0) and Pytorch 1.3. I am pretty sure that the "too many resources" error is linked to the THCState parameter though, make sure to completely recompile DCN after changing the value. Maybe an even lower value of threads is neccesary, however as the Jetson TX2 has more cores than the Nano that seems unlikely. In Windows, DCNv1 (see respective threat) worked instead of DCNv2, you can also try looking into that.

Thank you faheuer, JP 4.3 + Pytorch 1.3 + maxThreads = 512 worked for me frame delay was 0.8 seconds

abedMeto commented 4 years ago

Hi,

I also managed to run CenterNet using JP4.4 (cuda 10.2) + Pytorch 1.3 + maxThreads = 512 BUT!!! the frame delay was 1.6+ seconds Seems like cuda 10.0 is better for Jetson Nano in terms of performance