torrvision / crfasrnn

This repository contains the source code for the semantic image segmentation method described in the ICCV 2015 paper: Conditional Random Fields as Recurrent Neural Networks. http://crfasrnn.torr.vision/
Other
1.34k stars 462 forks source link

classification speed #134

Open neobarney opened 7 years ago

neobarney commented 7 years ago

Hello, while running crfasrnn_demo.py, i'm not able to go below 4s on a K80 or P100. It's running on GPU I'm sure (I checked memory allocations).

Seems pretty slow to me, does anyone got any idea on how I could speed up the classification ? Thanks

bittnt commented 7 years ago

With a Titan X Pascal, running crfasrnn_demo.py takes 0.4 seconds per image (500x500) in GPU mode.

neobarney commented 7 years ago

@bittnt thanks are you using crfasrnn_demo.py or a custom script? are you running on docker?

neobarney commented 7 years ago

you meant 500x500 ?

bittnt commented 7 years ago

500x500 is the image resolution. I was using crfasrnn_demo.py script.

neobarney commented 7 years ago

@bittnt compilation works without using cudnn, but fails if using it. My previous install was without cudnn, which expains the slow speed. So I'm trying to compile the modified version of caffe with CUDA 8 and CUDNN6 Which version did you use in your setup?

neobarney commented 7 years ago

@bittnt Finally I'm able to compile it and run the test successfully with cuda 7 and cudnn 3 only (all other up versions fail), but the speed is really low compare to yours (4.2 s) for the default image How could you achieve such speed ? Would you mind sharing you cuda and cudnn versions ? thanks :)

bittnt commented 7 years ago

I have tested this under CUDA8, CUDA7.5, CUDA6.5, and CUDA6.

neobarney commented 7 years ago

@bittnt Great to know :) do you compile using CUDNN or not ? Yesterday I managed to compile with cudnn 3 and cuda 7, on K80 the speed was the same (4.2s) as without cudnn. Impossible to go below 4s neither on K80 nor on P100. Any idea about where the problem might come from ?

bittnt commented 7 years ago

I am not sure what is the problem. The speed you reported sounds like running the whole FCN-8s+crfasrnn on CPU rather than GPUs. Also, check the version of the code you are using.

I had tested the code on K80, both in AWS and Google Cloud before, it should take less than 1 second at least on the image with resolution 500x500.

akashdexati commented 7 years ago

@bittnt You wrote - "I have tested this under CUDA8, CUDA7.5, CUDA6.5, and CUDA6"

Can you please tell which version of cuDNN you used with CUDA8 ? I am asking this because only cuda7+cuDNN3 worked for me. For rest all it gave some or other error.

bittnt commented 7 years ago

@akashdexati I think the error should be resolved if you use the crfasrnn branch (https://github.com/torrvision/caffe/tree/crfrnn) of the code rather than master. You do need to change the prototxt a bit for using the new branch. The new branch code merges the CRFasRNN layer with the latest Caffe, which supports CUDA8 and latest CUDNN.

akashdexati commented 7 years ago

@bittnt I donot see anything like crfasrnn branch. Can you please help me with that branch.

bittnt commented 7 years ago

https://github.com/torrvision/caffe/tree/crfrnn

akashdexati commented 7 years ago

@bittnt I worked on the crfrnn branch but still CUDA8 + cudNN(5/5.1/6/7) nothing worked. If it worked for you can you please share the cudnn.hpp file which worked for CUDA8 + cudnn(x).