intel / caffe

This fork of BVLC/Caffe is dedicated to improving performance of this deep learning framework when running on CPU, in particular Intel® Xeon processors.
Other
849 stars 491 forks source link

Result not correct when applying it to SSD (Single Shot MultiBox Detector) #10

Closed heagoo closed 7 years ago

heagoo commented 7 years ago

I merged to code of SSD (https://github.com/weiliu89/caffe/tree/ssd) and intel caffe, but when run ssd_detect with SSD300, the result is not correct if MKL2017_AS_DEFAULT_ENGINE is enabled. After code merge, following 2 things have been confirmed: 1) when only with MKL, ssd_detect shows correct result 2) when enable MKL2017_AS_DEFAULT_ENGINE, cpp_classification work correctly with VGG16 (classification result is correct) I guess SSD300 has a different input shape (3x300x300, not 3x224x224), thus make the convolution result is not correct. Could you please help to look into it? Thanks!

nickltj86 commented 7 years ago

hi @heagoo would you be willing to share your work on the merge with only MKL? would be interesting to see the results and performance with intelcaffe. do you have rough performance numbers per frame?

heagoo commented 7 years ago

@nickltj86 The performance is not so good (262ms per frame with 32 cores), I think it could be much better after enable MKL2017_AS_DEFAULT_ENGINE, but unfortunately, the result is not correct yet.

nickltj86 commented 7 years ago

Thanks. Good to know the performance. I think even after enabling MKL2017, the performance may not increase too much. From the benchmark, the new layers which are not optimized with MKL or MKL2017 such as the mbox layer will cause the most delays. The performance of the common cnn layers are very optimized and seems like as fast as those running on CUDA. the other non-optimized layers are a different story.

Do you have a fork with the merge? I think the Intel folks will need that to evaluate it as well.

heagoo commented 7 years ago

Shared to: https://github.com/heagoo/intel_caffe_ssd/

jdukat commented 7 years ago

Could you please try with MKL2017 engine enabled, but for layers that cause problems explicitly select engine CAFFE? For reference, you can see here how to select different engine is selected layer: https://github.com/intel/caffe/blob/master/models/mkl2017_googlenet_v1_knl/train_val.prototxt

We are working on fixes to simplify this process and avoid manual engine selection.

heagoo commented 7 years ago

@jdukat Yes, I've already do this, but the result is different from just using MKL, much different, and totally not correct. I tried: 1) only add MKL2017 engine to the first conv layer, works. (but seems still call the im2col implementation) 2) add MKL2017 to any other conv layers, NOT work. It may be related with the DNN function. Is there any detail description for the MKL DNN functions? And, the strange thing is that VGG16 works, do you think the input size matters? (VGG16 is 224x224, and SSD is 300x300 or 500x500)

jdukat commented 7 years ago

Input size matters for performance with MKL2017 for sure. It should not matter for correctness and it seems to be a bug. We should take a closer look at this, but at the moment I am not able to put this on top of my priorities.

I'll see what I can do about documentation for MKL DNN functions.

jdukat commented 7 years ago

MKL DNN API documentation

heagoo commented 7 years ago

Thanks! Confirmed, it's a bug of MKL, we can solve the issue by downloading https://github.com/intel-mxnet/mkl-release/releases/download/self_containted_MKLGOLD/mklml_lnx_2017.0.1.20161005.tgz And now, performance boost to 88ms under the same setting, great!

jdukat commented 7 years ago

Great, the same MKL version will be released for Caffe within next few days. @heagoo do you plan to issue a pull request with your changes for Intel Caffe?

heagoo commented 7 years ago

Sure, will do it. @jdukat

ngaloppo commented 7 years ago

Is it confirmed that this bug exists in current master? If so, can we open this issue again to track the bug until it is fixed?

heagoo commented 7 years ago

This bug does not exist any more in the latest update with MKL 2017 update 1.

amoussawi commented 7 years ago

Hey @heagoo Could you please tell me what intel processor are you using if you don't mind? Thanks :)

heagoo commented 7 years ago

I am using E5-2699 v4, the most powerful one, :)

On Tue, Nov 22, 2016 at 5:31 AM, Abdallah Moussawi <notifications@github.com

wrote:

Hey @heagoo https://github.com/heagoo Could you please tell me what intel processor are you using if you don't mind? Thanks :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intel/caffe/issues/10#issuecomment-262073514, or mute the thread https://github.com/notifications/unsubscribe-auth/AFsLw4b9l5JImymyMnvS54hWvYTrxmtUks5rAg2tgaJpZM4Kh3BG .

--

"The first step is as good as half over."