Open neobarney opened 7 years ago
With a Titan X Pascal, running crfasrnn_demo.py takes 0.4 seconds per image (500x500) in GPU mode.
@bittnt thanks are you using crfasrnn_demo.py or a custom script? are you running on docker?
you meant 500x500 ?
500x500 is the image resolution. I was using crfasrnn_demo.py script.
@bittnt compilation works without using cudnn, but fails if using it. My previous install was without cudnn, which expains the slow speed. So I'm trying to compile the modified version of caffe with CUDA 8 and CUDNN6 Which version did you use in your setup?
@bittnt Finally I'm able to compile it and run the test successfully with cuda 7 and cudnn 3 only (all other up versions fail), but the speed is really low compare to yours (4.2 s) for the default image How could you achieve such speed ? Would you mind sharing you cuda and cudnn versions ? thanks :)
I have tested this under CUDA8, CUDA7.5, CUDA6.5, and CUDA6.
@bittnt Great to know :) do you compile using CUDNN or not ? Yesterday I managed to compile with cudnn 3 and cuda 7, on K80 the speed was the same (4.2s) as without cudnn. Impossible to go below 4s neither on K80 nor on P100. Any idea about where the problem might come from ?
I am not sure what is the problem. The speed you reported sounds like running the whole FCN-8s+crfasrnn on CPU rather than GPUs. Also, check the version of the code you are using.
I had tested the code on K80, both in AWS and Google Cloud before, it should take less than 1 second at least on the image with resolution 500x500.
@bittnt You wrote - "I have tested this under CUDA8, CUDA7.5, CUDA6.5, and CUDA6"
Can you please tell which version of cuDNN you used with CUDA8 ? I am asking this because only cuda7+cuDNN3 worked for me. For rest all it gave some or other error.
@akashdexati I think the error should be resolved if you use the crfasrnn branch (https://github.com/torrvision/caffe/tree/crfrnn) of the code rather than master. You do need to change the prototxt a bit for using the new branch. The new branch code merges the CRFasRNN layer with the latest Caffe, which supports CUDA8 and latest CUDNN.
@bittnt
I donot see anything like crfasrnn branch
. Can you please help me with that branch.
@bittnt I worked on the crfrnn branch
but still CUDA8 + cudNN(5/5.1/6/7) nothing worked.
If it worked for you can you please share the cudnn.hpp
file which worked for CUDA8 + cudnn(x).
Hello, while running crfasrnn_demo.py, i'm not able to go below 4s on a K80 or P100. It's running on GPU I'm sure (I checked memory allocations).
Seems pretty slow to me, does anyone got any idea on how I could speed up the classification ? Thanks