Merge request: select zero-time instance as best right away for Auto-optimizing layers

kikaxa commented 7 years ago

Hi @hughperkins,

When I use deepcl on intel gpus, i often get "0ms" timings when auto-optimizing layers.

This patch directs deepcl to use the first instance with "0ms" timing it encounters as best.

it will end up as best anyway, but after this patch, we won't spend time, power and memory on trying other, potentially very slow, kernels.

For me, this patch removes 1.7s of "startup" time out of 5s run-time.

example log:

ForwardAuto: kernel 7 219ms
   forward kernel 0: cannot be used
   forward kernel 1 time: 0ms
   forward kernel 2 time: 32ms
   forward kernel 3 time: 109ms
   forward kernel 4 time: 733ms
   forward kernel 5 time: 94ms
   forward kernel 6 time: 15ms
   forward kernel 7 time: 265ms
   forward layer selected kernel 1
   forward kernel 0: cannot be used
   forward kernel 1 time: 0ms
   forward kernel 2 time: 0ms
   forward kernel 3 time: 0ms
   forward kernel 4 time: 16ms
   forward kernel 5 time: 0ms
   forward kernel 6 time: 0ms
   forward kernel 7 time: 219ms
   forward layer selected kernel 1

patches:

--- D:/ForwardAuto - Copy.cpp   Fri Sep 30 10:09:42 2016
+++ D:/ForwardAuto.cpp  Thu Nov 10 17:40:19 2016
@@ -71,6 +71,10 @@
                     candidate->forward(batchSize, dataWrapper, weightsWrapper, biasWrapper, outputWrapper);
                     milliseconds[thisIndex] = (int)timer.lap();
                     cout << StatefulTimer::instance()->prefix << "ForwardAuto: kernel " << thisIndex << " " << milliseconds[thisIndex] << "ms" << endl;
+                    if (milliseconds[thisIndex] == 0) { //we can't get better time, use this instance
+                        cout << "   forward layer selected kernel with zero time" << thisIndex << endl;
+                        this->chosenIndex = thisIndex;
+                    }
                     return;
                 } catch(runtime_error &e) {
                     cout << StatefulTimer::instance()->prefix << "ForwardAuto: kernel " << thisIndex << " this instance cant be used: " << e.what() << endl;

--- D:/BackwardAuto - Copy.cpp  Fri Sep 30 10:09:42 2016
+++ D:/BackwardAuto.cpp Thu Nov 10 17:40:23 2016
@@ -70,6 +70,10 @@
                     candidate->backward(batchSize, inputDataWrapper, gradOutput, weightsWrapper, gradInput);
                     milliseconds[thisIndex] = (int)timer.lap();
                     cout << StatefulTimer::instance()->prefix << "BackwardAuto: kernel " << thisIndex << " " << milliseconds[thisIndex] << "ms" << endl;
+                    if (milliseconds[thisIndex] == 0) { //we can't get better time, use this instance
+                        cout << "   backward layer selected kernel with zero time" << thisIndex << endl;
+                        this->chosenIndex = thisIndex;
+                    }
                     return;
                 } catch(runtime_error &e) {
                     cout << StatefulTimer::instance()->prefix << "BackwardAuto: kernel " << thisIndex << " this instance cant be used: " << e.what() << endl;

--- D:/BackpropWeightsAuto - Copy.cpp   Fri Sep 30 10:09:42 2016
+++ D:/BackpropWeightsAuto.cpp  Thu Nov 10 17:40:35 2016
@@ -70,6 +70,10 @@
                     candidate->calcGradWeights(batchSize, inputDataWrapper, gradOutput, weightsWrapper, gradInput);
                     milliseconds[thisIndex] = (int)timer.lap();
                     cout << StatefulTimer::instance()->prefix << "BackpropWeightsAuto: kernel " << thisIndex << " " << milliseconds[thisIndex] << "ms" << endl;
+                    if (milliseconds[thisIndex] == 0) { //we can't get better time, use this instance
+                        cout << "   calcGradWeights layer selected kernel with zero time" << thisIndex << endl;
+                        this->chosenIndex = thisIndex;
+                    }
                     return;
                 } catch(runtime_error &e) {
                     cout << StatefulTimer::instance()->prefix << "BackpropWeightsAuto: kernel " << thisIndex << " this instance cant be used: " << e.what() << endl;

hughperkins commented 7 years ago

it will end up as best anyway, but after this patch, we won't spend time, power and memory on trying other, potentially very slow, kernels.

Fair point.

hughperkins commented 7 years ago

Can you submit as a branch and/or pull request?

kikaxa commented 7 years ago

sorry, i don't use git anywhere at the time, but can do this for you, if you would like to.

hughperkins / DeepCL

Merge request: select zero-time instance as best right away for Auto-optimizing layers #102