Open deshanadesai opened 8 years ago
Are you sure the GPU is being used? what I see in "time cnn" looks more like CPU time... In my GTX980 21112 Nodes evaluated take ~20s. at most.
See for example: Total Nodes 57510 Nodes evaluated 9786 Nodes inherited 43824 Nodes filtered 297 Nodes hashed 3603
Time loading model 71.5355 s. Time full algorithm 17.3426 s. time mser 1.0871 s. time reg feat 1.6125 s. time clustering 1.8225 s. time cnn 11.5037 s. time sr 0.0030 s. time nms 0.0008 s.
I am having similar problem. I verified that the GPU is being used.
FINAL ([44 x 24 from (405, 325)] 0.9967 3 sosa Total Nodes 36250 Nodes evaluated 10669 Nodes inherited 22669 Nodes filtered 77 Nodes hashed 2835
Time loading model 19.4706 s. Time full algorithm 408.6304 s. time mser 0.1621 s. time reg feat 0.1399 s. time clustering 0.9715 s. time cnn 407.0412 s. time sr 0.0026 s. time nms 0.0002 s.
Fri Feb 24 14:18:19 2017 +------------------------------------------------------+ | NVIDIA-SMI 352.99 Driver Version: 352.99 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K40c Off | 0000:05:00.0 Off | 0 | | 26% 52C P0 187W / 235W | 2511MiB / 11519MiB | 99% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 7071 C ./img2hierarchy_cnn 2481MiB | +-----------------------------------------------------------------------------+
This is going to be difficult to debug, because I cannot reproduce the problem in my machine.
I would start by trying to isolate the problem. Actually all the time under "time cnn" is dedicated to successive calls to the Classifier class methods. See lines between 347 and 469 on main_cnn.cpp : first we build a batch of images and then send it to the classifier (line 414). Can you please try to modify the main function to just make a single call to Classifier::Classify() method and measure the time it takes?
You can also try to modify the batch size on line 186.
Changing the batch size to 1 did not help. I am not sure if have enough understanding of the code to make a single call to the Classify() function as recommended. I will give it a try later. Thanks for the help.
Can you please try to change the main function in main_cnn.cpp as follows:
int main( int argc, char** argv ) {
::google::InitGoogleLogging(argv[0]);
string model_file = string("dictnet_vgg_deploy.prototxt"); string trained_file = string("dictnet_vgg.caffemodel"); string label_file = string("lex.txt"); int batch_size = 128; double t_cnn_load = (double)getTickCount(); Classifier classifier(model_file, trained_file, label_file, batch_size); t_cnn_load = ((double)getTickCount() - t_cnn_load) / getTickFrequency(); cout << " Time loading model " << t_cnn_load << " s." << endl;
string in_imagename = string(argv[1]);
Mat img,proposal;
img = imread(in_imagename); cvtColor(img,proposal,COLOR_RGB2GRAY);
resize(proposal,proposal,classifier.getInputSize());
// image normalization as in Jaderberg etal. Scalar mean,std; proposal.convertTo(proposal, CV_32FC1); meanStdDev( proposal, mean, std ); proposal = (proposal - mean[0]) / ((std[0] + 0.0001) /128);
vector
cout << "Prediction " << predictions[0].first << " " << predictions[0].second << endl;
}
Then run the code for a single cropped word. To me it shows:
CNN time (batch) 0.968857
which means my GPU processes 128 images in less than a second.
Thank you for the help. I made the suggested changes and the output is-
Time loading model 29.005 s. CNN time (batch) 2.79471 Prediction comprehensibility 0.00826504
Note- I had to make the following changes based on the original main_cnn.cpp to get it compiled-
vector
Thanks.
Hello,
I am currently using the default parameters for diversification strategies:
For one image, the time taken by full algorithm is 436 seconds (~7 minutes). I am using a NVIDIA Titan-X GPU with 12 GB memory. The details of time taken for one image:
Are there ways to decrease the amount of time taken for each image without having to change the diversification strategies?
Please let me know.
Thanks, Deshana