karpathy / convnetjs

Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.
MIT License
10.88k stars 2.04k forks source link

Speed-up of ConvLayer.forward #11

Closed mdda closed 10 years ago

mdda commented 10 years ago

I noticed that the ConvLayer.forward is being benchmarked by convnet-benchmarks - and I thought have a go at some optimisation. With a little type-hinting here and there (plus some slight loop-order modifications, and constant extraction), I think I've got at least a 2x speed-up (YMMV, of course). It's functionally identical (AFAICT).

Here's the run-down of the benchmark timings

      // Orig   #5 iteration : 4880ms (original)
      // Dupe   #5 iteration : 5067ms (+1 console.log!)
      // oxoy   #5 iteration : 4155ms (move oy,ox calc outside of inner loop)
      // xy|0   #5 iteration : 4155ms (type hint on x and y)
      // xyst|0 #5 iteration : 2607ms (type hint on stride_x and stride_y)
      // more|0 #5 iteration : 2662ms (type hint on f>depth - WORSE)
      // hint|0 #5 iteration : 2591ms (type hint on constructors)
      // ox out #5 iteration : 2586ms (move ox variable setting outside y loop (splitting 'if' makes it WORSE, though))
      // xy->yx #5 iteration : 2398ms (switch loop order, so that faster moving indices inside (better cache perf))
      // contru #5 iteration : 2366ms (type-hinting into constructor of A)
      // VolSet #5 iteration : 2322ms (type-hinting into Vol.set())

One issue with submitting the patch, though, is that my build/convnet.js is also updated, which seems wasteful. OTOH, since you want the concatted-minimised version in the repo, I don't see how to get away from including it...

In addition, the current state-of-play has both forward_orig and forward(new) in it - as well as some commentary about things that don't work, etc. Would you like me to clean them up before submitting?

All the Best Martin :-)

karpathy commented 10 years ago

This is a Awesome, thanks a lot for looking into it - I wasn't aware of type hinting.

I included build/convnet.js and the minified version for convenience, but I agree that it's a little messy with it there. Do you know what people usually do in these cases? I'd like to release the latest built version here on Github.

And we should definitely fold this into convnetjs, replacing the old code. I'll first wait to see if you happen to have suggestions on what we should do with built files. As a side note, I might also look into eventually converting the backprop part, and fullyconnected layer and pooling layer with the same tricks.

Lastly, I have a WebGL version of ConvNetJS in my local repo and it's almost nicely wired in. It's a forward_GPU funciton and it is extremely fast compared to this code.

Thanks, Andrej

mdda commented 10 years ago

Looks like the right way to do binary/machine-generated assets is with "releases". First, git rm ./build/convnet.*js, then add them to .gitignore.

Then, use a script like https://pypi.python.org/pypi/ghrelease/0.1.2 to upload the convnet.min.js asset individually.

Perhaps the version number could be (for instance) 2014.08.31, so as to avoid version increment anxiety...

It's not ideal, though, is it?

mdda commented 10 years ago

The type-hinting (particularly on stride) was a big win (and that should be placeable higher up the food chain, so that it works more generally too). The other win was re-ordering the loops, so that they execute more in order of memory placement (row-wise, rather than column-wise), since that apparently makes the code more cache-friendly. Surprisingly, factoring out in-loop constants (like the array arithmetic) didn't help much.

I'd love to have a look at the WebGL stuff : Since that was already solidly in my plan for using convnet.js (FWIW, I'm one of the contributors for https://github.com/stackgl/shader-school). My main goal was to implement the back-prop step on the GPU, to reduce training time... I know there are other capable convnet modules out there, but I particularly want to retain client-side compatibility for the trained network.

If I can 'get in' on the WebGL side, I'd be happy to contribute to doing the (more basic) JS-side optimisations too.

karpathy commented 10 years ago

Thanks, I thought releases might be the preferred way. I noticed a few repos using them for this purpose but never fully read up on it.

My WebGL implementation is essentially a wrapper around some core functions inside jpcnn (a really nice library from Pete Warden). Among other things he implemented gemm ("General Matrix Multiply" as seen in BLAS) in WebGL. This can be used to do very fast convolutions in a straight forward way: you reshape all patches into rows, take filters as columns, matrix multiply, and then reshape the result back into correct output dimensions. It's a little wasteful in terms of space (because you have to more than duplicate all image pixels, WITH overlaps in their reshaped form), but it's an often used strategy (in very early CNNs by LeCun and also for example in Caffe).

However jpcnn only implements forward pass, not backprop.

I'll look at cleaning all of this mess up today, and maybe fold in some of my preliminary WebGL functionality.

mdda commented 10 years ago

Great!

Thanks for pointing me towards Peter Warden, whose blog (http://petewarden.com/) is super-interesting & in-depth. And, in case anyone else is looking for the WebGL behind the jpcnn library : have a look at https://github.com/jetpacapp/DeepBeliefSDK/blob/gh-pages/JavascriptLibrary/jpcnn.js#L1954

I'll check back in a day or two, and see how 'mergeable' my patch will be by then.

All the Best Martin :-)

karpathy commented 10 years ago

Ok, I removed the built library from repo and created a release instead. I also folded your optimizations into ConvLayer (slightly modified), and also tweaked the backward pass so that the same optimizations are applied to backprop for Conv layer.

Now moving on to incorporating the GPU code. It's a little tricky because it relies on jpcnn, which in turn relies on underscore. I'm not sure what the cleanest way to add these is then. Should I include them in the compile as all the other convnetjs files? It would make the entire library quite a bit larger, but I'm not sure if there is any other way.

karpathy commented 10 years ago

Thanks Martin, end result of this issue: the ConvLayer is now twice as fast (both forward and backward pass). For future (dramatic) improvements we are moving towards WebGL. Closing the issue.