Closed fsword73 closed 8 years ago
Well... that's a good question :-) So, how about, we focus on the convolutional layer? That is the main bottleneck. And rather than optimizing the conv layer in each library one by one, how about we switch all the libraries to use @naibaf7 's greentea convolutional layer? Are you coming mostly from a 'for fun' angle? Or for a university/research project? Or as part of your job? If it is for academic research, or as part of your job, you might be able to get a very quick win by doing:
It will be easier to migrate cltorch to use greentea convolutions, than deepcl. Since deepcl needs to run on windows, and be compilable using msvc2008 and msvc2010, so it can work with python 2.7 and python 3.4.
Alternatively, you can just look at making very fast convolutions in greentea, and I can handle migrating cltorch, and possibly also deepcl, onto greentea convolutional layer.
I think we might as well all work on optimizing the same convolutional library probably. Though, I dont know, depends somewhat on your own goals, how you see it? I suppose another option is you could create an entirely new convolutional implementation too. But, I guess that would be a lot of work. Maybe better to eg pick one particular hardware (AMD R9 Fury?), and one particular geometry (3 by 3 kernels with about the appropriate dimensions for residual learning ?), and make those go super fast, and fit that into greentea somehow?
What are your own goals? How do you see your own work fitting into the existing libraries and frameworks?
I planned to have 1-2 interns from university. I hope to get approval from my boss's boss in next a few weeks.
From the convnet benchmark, greentea's convolutional layer has gap with cuDNN. Your suggestion is great. The base hardware can be AMD R9 fury . I have some basic testing in ministry. The major issue is backprop.
IT is good point to optimize one convolutional library only. It will save a lot of time.
Which os version do you recommend to run these benchmarks?
I have profiled deepcl with mnist on windiws7.
My initial plan is to rewrite 3x3 filters to 21x21 filters. After profiling of mnist, back prop weights is major performance lost. We can begin from top5 kernels. After that, we can rewrite top10. i agree that convolutional layer is the most important one.
I planned to have 1-2 interns from university. I hope to get approval from my boss's boss in next a few weeks.
Ok, sounds good :-)
From the convnet benchmark, greentea's convolutional layer has gap with cuDNN. Your suggestion is great.
Ok :-)
The base hardware can be AMD R9 fury .
Sounds good to me. Furies are about as fast as W9100, but significantly cheaper. The only other option is R9-390X, which has slightly more memory, but slower flops. It will be easier to get informal support from AMD for optimizing on Fury than optimizing on R9-390X.
IT is good point to optimize one convolutional library only. It will save a lot of time.
Ok
Which os version do you recommend to run these benchmarks?
Personally I use Ubuntu 14.04 and 16.04. I believe that Torch users are mostly using Ubuntu 14.04. Windows is good too though :-)
I have profiled deepcl with mnist on windiws7.
mnist is kind of a toy really. A recent state of the art network is Microsoft's residual network. You ideally want to target imagnet. You can use cifar as a playground, eg cifar-torch
Or, if you want to target mnist-sized things, I guess you could target alphago
Just some ideas :-)
(well... you might consider targeting soumith's convnet-benchmarks Note that these certianly run on Ubuntu though. )
Soumith's convent-benchmarks is small and much low time consuming one to start performance tuning.
Imagenet sounds a big one or Microsoft residual networks. now I almost have the clear goal to apply for interns.
Thanks your time!
Hi, Hugh If I want to cost 6 months or more to do perofrmance profiling/tuning, Can you give an priority of following: * clTorch cl-Caffe * deppCL * clNN other frameworks of Deep Convolutional Nerual Network