Optimize the way Phi is computed.

philipperemy / tensorflow-phased-lstm

Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences (NIPS 2016) - Tensorflow 1.0

MIT License

129 stars 24 forks source link

Optimize the way Phi is computed. #4

Closed victorcampos7 closed 7 years ago

victorcampos7 commented 7 years ago

Hi Philippe,

Thank you very much for your implementation, it is really helpful. I was going through the code and found that your computation of Phi differs from the equation of the paper, since you compute tf.mod twice.

Current code: def phi(times, s, tau): return tf.div(tf.mod(tf.mod(times - s, tau) + tau, tau), tau)

Implementation following Equation 6 in the paper: def phi(times, s, tau): return tf.div(tf.mod(times - s, tau), tau)

Is there any reason why you did this modification? Or is it an error?

Thanks, Victor

philipperemy commented 7 years ago

Hi Victor,

Thank you very much for your feedback. Those two formulas are exactly the same actually (at least from a math point of view). So it's not a bug! Run this snippet if you are not convinced:

import numpy as np
import tensorflow as tf

def phi(times, s, tau):
    return tf.div(tf.mod(times - s, tau), tau)

def phi2(times, s, tau):
    return tf.div(tf.mod(tf.mod(times - s, tau) + tau, tau), tau)

def main():
    tf.InteractiveSession()
    times = tf.constant(np.array(list(range(10)), dtype='float32'))
    s = 2 * tf.ones(shape=[10])
    tau = 3 * tf.ones(shape=[10])

    p = phi(times, s, tau)
    p2 = phi2(times, s, tau)

    np.testing.assert_almost_equal(p.eval(), p2.eval())
    print(p.eval())
    print(p2.eval())

if __name__ == '__main__':
    main()

Concerning the utility of such a formula, I cannot remember why I did that. But I think it was related to numerical stability. But it seems like we don't need it anymore now.

philipperemy commented 7 years ago

I'll do some tests, then will update it accordingly. Thanks!

victorcampos7 commented 7 years ago

Great, thanks!

My main concern was speed. Despite the expressions give the same numerical results, the additional call to tf.mod makes the code slower. This is specially noticeable when running on GPU, since tf.mod does not have a GPU implementation.

philipperemy commented 7 years ago

@victorcampos7 fully agree!

philipperemy commented 7 years ago

Just a benchmark I did very quickly:

def phi(times, s, tau): return tf.div(tf.mod(tf.mod(times - s, tau) + tau, tau), tau)

Forward-pass takes 1.19 seconds on average on the MNIST example.

def phi(times, s, tau): return tf.div(tf.mod(times - s, tau), tau)

Forward-pass takes 1.11 seconds on average on the MNIST example.

So that's a 6.7% speed increase. Running on TITAN X Pascal.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 0000:02:00.0      On |                  N/A |
| 34%   60C    P2    60W / 250W |  11657MiB / 12189MiB |     33%      Default |

philipperemy commented 7 years ago

Solved with cbb9c65b423d0bde66daa9ec5128912fd28c4f16