Closed janchorowski closed 8 years ago
@nouiz Thoughts?
This is such a common function that maybe Theano should encode the best practice way of doing a rectifier in tensor.nnet.
This is pretty complicated. We added in Theano a theano.tensor.nnet.relu() that encode what we recommand.
The timing should not contain transfer. This make the timing not as good. The best speed of all the variant wasn't the same on the CPU and GPU. Also, the forward and the grad didn't had the same fastest implementation... The one in tensor.relu() is the best compromize we got.
Also, the best will also depend of what is round the relu in the graph. If there is other elemwise, like the addition of the bias, then this will trigger even more cases that could change the best.
In all cases, why do you want to spend time optimizing this more? How much % of time do you spend in relu? I'm thinking that very few % of time is spent in it, so I don't think it is worth optimizing this more.
Also, cudnn have a relu operation. If you really want to optimize it on the GPU, someone should wrap it. But I don't think it will make a big difference.
In this case we should use tensor.nnet.relu
in our brick. It should be an easy task for CCW.
Isn't it enough to change one single line? That's too much for a CCW ticket!
On 1 February 2016 at 11:44, dmitriy-serdyuk notifications@github.com wrote:
In this case we should use tensor.nnet.relu in our brick. It should be an easy task for CCW.
— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks/issues/960#issuecomment-178062055.
Oops, I meant too little.
On 1 February 2016 at 13:15, Dzmitry Bahdanau dimabgv@gmail.com wrote:
Isn't it enough to change one single line? That's too much for a CCW ticket!
On 1 February 2016 at 11:44, dmitriy-serdyuk notifications@github.com wrote:
In this case we should use tensor.nnet.relu in our brick. It should be an easy task for CCW.
— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks/issues/960#issuecomment-178062055.
Well, one needs to benchmark it, there is some work.
I can do it, since I actually had a pathological net with ACDC layers that spent 10% time in memory copies induced by tensor.switches
That would be cool, but in this case please do not make it another long-standing ticket that we have to remember about for three months :)
@janchorowski , ping.
Okay, this ticket really needs to be addressed soon. This is getting a little ridiculous. We should switch to tensor.nnet.relu
and push any discussions of speed upstream. This is what both Lasagne and Keras are doing (which between them have 10x our users if we use GitHub stars as a proxy) so it's probably a pretty reasonable thing to do.
the Rectifier brick uses
tensor.switch
(https://github.com/mila-udem/blocks/blob/master/blocks/bricks/simple.py#L302), according to a simple timing this is quite slower thantensor.maximum
. What is the rationale for this? Maybe we should switch to the fastertensor.maximum
?Microbenchmark (TitanX):
Gives:
10 loops, best of 3: 18.7 ms per loop
100 loops, best of 3: 7.1 ms per loop
100 loops, best of 3: 10.9 ms per loop