pluskid / Mocha.jl

Deep Learning framework for Julia
Other
1.29k stars 254 forks source link

Neurons very slow on CPUBackend #182

Closed uschmidt83 closed 8 years ago

uschmidt83 commented 8 years ago

Hi, I found that neuron computations are very slow on the CPUBackend, which seems to be caused by a lack of type annotations in the code.

Using ReLU as an example, here's code to reproduce the issue and a simple fix:

using Mocha
import Mocha: forward, backward

backend = CPUBackend()
init(backend)

sz   = (256,256,1,5)
data = Array[randn(sz),randn(sz)]

function setup_net(neuron)
  layers = Vector{Layer}()
  push!(layers, MemoryDataLayer(data=data, batch_size=sz[4]))
  push!(layers, ConvolutionLayer(name="conv1", bottoms=[:data],  tops=[:conv1], n_filter=16, neuron=neuron, kernel=(3,3), pad=(1,1)))
  push!(layers, ConvolutionLayer(name="conv2", bottoms=[:conv1], tops=[:conv2], n_filter=1,  neuron=neuron, kernel=(1,1), pad=(0,0)))
  push!(layers, SquareLossLayer(bottoms=[:conv2, :label]))
  Net("test", backend, layers)
end

type MyReLU <: ActivationFunction end
function forward(backend :: CPUBackend, neuron :: MyReLU, output :: Blob)
  function _forward{T}(output::AbstractArray{T})
    @simd for i in eachindex(output)
      @inbounds output[i] = max(0, output[i])
    end
  end
  _forward(output.data)
end
function backward(backend :: CPUBackend, neuron :: MyReLU, output :: Blob, gradient :: Blob)
  function _backward{T,N}(gradient::AbstractArray{T,N}, output::AbstractArray{T,N})
    @simd for i in eachindex(gradient,output)
      @inbounds gradient[i] *= (output[i] > 0)
    end
  end
  _backward(gradient.data,output.data)
end

net1 = setup_net(Neurons.ReLU())
net2 = setup_net(MyReLU())
map(init,(net1,net2))

map(forward,(net1,net2))
@time forward(net1)
@time forward(net2)

map(backward,(net1,net2))
@time backward(net1)
@time backward(net2)

map(destroy,(net1,net2))
shutdown(backend)

I observe these @time results on my machine (Julia 0.4.3, Mocha master):

  0.781356 seconds (27.85 M allocations: 432.488 MB, 3.64% gc time)
  0.064131 seconds (419 allocations: 7.512 MB)
  1.316355 seconds (38.99 M allocations: 594.959 MB, 3.04% gc time)
  0.068223 seconds (413 allocations: 11.531 KB)

Best, Uwe

pluskid commented 8 years ago

@uschmidt83 Thanks a lot! This is a nice catch. Do you have time to create PR to fix some commonly used layers? Otherwise, I will try to fix some of them over the weekend.

uschmidt83 commented 8 years ago

@pluskid I can do it, shouldn't be too much work.

uschmidt83 commented 8 years ago

@pluskid Is there a problem with my PR, or do you want to solve it in a different way?

pluskid commented 8 years ago

Sorry I somehow missed that one! That looks good and I have merged it! Thanks for the pr and thanks for double checking!

uschmidt83 commented 8 years ago

@pluskid No problem. Thanks for maintaining the module :)