Ways to speedup end-to-end module.

deshanadesai commented 8 years ago

Hello,

I am currently using the default parameters for diversification strategies:

#define PYRAMIDS 1 // Use spatial pyramids
#define CUE_D 1 // Use Diameter grouping cue
#define CUE_FGI 1 // Use ForeGround Intensity grouping cue
#define CUE_BGI 1 // Use BackGround Intensity grouping cue
#define CUE_G 1 // Use Gradient magnitude grouping cue
#define CUE_S 1 // Use Stroke width grouping cue
#define CHANNEL_I 0 // Use Intensity color channel
#define CHANNEL_R 1 // Use Red color channel
#define CHANNEL_G 1 // Use Green color channel
#define CHANNEL_B 1 // Use Blue color channel

For one image, the time taken by full algorithm is 436 seconds (~7 minutes). I am using a NVIDIA Titan-X GPU with 12 GB memory. The details of time taken for one image:

 Total Nodes         62160
Nodes evaluated     21112
Nodes inherited     30036
Nodes filtered      913
Nodes hashed        10099

Time loading model      15.9949 s.
Time full algorithm     436.4314 s.
     time mser          0.0568 s.
     time reg feat      0.1766 s.
     time clustering    1.8652 s.
     time cnn           434.1018 s.
     time sr            0.0028 s.
     time nms           0.0080 s.

Are there ways to decrease the amount of time taken for each image without having to change the diversification strategies?

Please let me know.

Thanks, Deshana

lluisgomez commented 7 years ago

Are you sure the GPU is being used? what I see in "time cnn" looks more like CPU time... In my GTX980 21112 Nodes evaluated take ~20s. at most.

See for example: Total Nodes 57510 Nodes evaluated 9786 Nodes inherited 43824 Nodes filtered 297 Nodes hashed 3603

Time loading model 71.5355 s. Time full algorithm 17.3426 s. time mser 1.0871 s. time reg feat 1.6125 s. time clustering 1.8225 s. time cnn 11.5037 s. time sr 0.0030 s. time nms 0.0008 s.

sxs4337 commented 7 years ago

I am having similar problem. I verified that the GPU is being used.

FINAL ([44 x 24 from (405, 325)] 0.9967 3 sosa Total Nodes 36250 Nodes evaluated 10669 Nodes inherited 22669 Nodes filtered 77 Nodes hashed 2835

Time loading model 19.4706 s. Time full algorithm 408.6304 s. time mser 0.1621 s. time reg feat 0.1399 s. time clustering 0.9715 s. time cnn 407.0412 s. time sr 0.0026 s. time nms 0.0002 s.

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 7071 C ./img2hierarchy_cnn 2481MiB | +-----------------------------------------------------------------------------+

lluisgomez commented 7 years ago

This is going to be difficult to debug, because I cannot reproduce the problem in my machine.

I would start by trying to isolate the problem. Actually all the time under "time cnn" is dedicated to successive calls to the Classifier class methods. See lines between 347 and 469 on main_cnn.cpp : first we build a batch of images and then send it to the classifier (line 414). Can you please try to modify the main function to just make a single call to Classifier::Classify() method and measure the time it takes?

You can also try to modify the batch size on line 186.

sxs4337 commented 7 years ago

Changing the batch size to 1 did not help. I am not sure if have enough understanding of the code to make a single call to the Classify() function as recommended. I will give it a try later. Thanks for the help.

lluisgomez commented 7 years ago

Can you please try to change the main function in main_cnn.cpp as follows:

int main( int argc, char** argv ) {

::google::InitGoogleLogging(argv[0]);

string model_file = string("dictnet_vgg_deploy.prototxt"); string trained_file = string("dictnet_vgg.caffemodel"); string label_file = string("lex.txt"); int batch_size = 128; double t_cnn_load = (double)getTickCount(); Classifier classifier(model_file, trained_file, label_file, batch_size); t_cnn_load = ((double)getTickCount() - t_cnn_load) / getTickFrequency(); cout << " Time loading model " << t_cnn_load << " s." << endl;

string in_imagename = string(argv[1]);

Mat img,proposal;

img = imread(in_imagename); cvtColor(img,proposal,COLOR_RGB2GRAY);

resize(proposal,proposal,classifier.getInputSize());

// image normalization as in Jaderberg etal. Scalar mean,std; proposal.convertTo(proposal, CV_32FC1); meanStdDev( proposal, mean, std ); proposal = (proposal - mean[0]) / ((std[0] + 0.0001) /128);

vector batch; for (int i=0; i<batch_size; i++) batch.push_back(proposal); double t_cnn_eval = getTickCount(); std::vector predictions = classifier.Classify(batch); t_cnn_eval = ((double)getTickCount() - t_cnn_eval) / getTickFrequency(); cout << " CNN time (batch) " << t_cnn_eval << endl;

cout << "Prediction " << predictions[0].first << " " << predictions[0].second << endl;

}

Then run the code for a single cropped word. To me it shows:

CNN time (batch) 0.968857

which means my GPU processes 128 images in less than a second.

sxs4337 commented 7 years ago

Thank you for the help. I made the suggested changes and the output is-

Time loading model 29.005 s. CNN time (batch) 2.79471 Prediction comprehensibility 0.00826504

Note- I had to make the following changes based on the original main_cnn.cpp to get it compiled- vector batch; std::vector predictions = classifier.Classify(batch);

Thanks.

lluisgomez / TextProposals

Ways to speedup end-to-end module. #11