Open PeadarOhAodha opened 8 years ago
CUDA backend for Theano: If your graphics card is not listed on this site: https://developer.nvidia.com/cuda-gpus you will not be able to use CUDA. This is the case for me (Intel HD 6000) so I have to use:
GPUarray (OpenCL) backend for Theano: We first need the GPUarray library. This site provides very simple instructions on how to do so: http://deeplearning.net/software/libgpuarray/installation.html
Once that's done you can see whether your GPU is used running this code in python
from theano import function, config, shared, tensor, sandbox
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
The crucial line is
x = theano.shared(numpy.asarray(rng.rand(vlen), config.floatX))
The first argument of the shared variable constructor is value
(here, a numpy array or random numbers) and the second config.floatX
is necessary so that the code is runnable on the GPU (this is not clear to me because in the .config.floatX documentation it only says: this "sets the default theano bit width for arguments passed as Python floating-point numbers." Don't know why that's necessary for using GPU..)
Here are some cool facts about how Theano allocates memory
(as far as I have understood it reading http://deeplearning.net/software/theano/tutorial/aliasing.html#borrowing-when-creating-shared-variables):
Using CPU:
Whenever a theano.shared
variable is constructed it gets a copy of the value
argument.
Example:
import numpy, theano
np_array = numpy.ones(2, dtype='float32')
s_default = theano.shared(np_array) # the constructor gets a copy of np_array
s_false = theano.shared(np_array, borrow=False) # the constructor gets a copy of np_array
s_true = theano.shared(np_array, borrow=True) # the constructor gets a pointer to np_array
Any changes to np_array will neither affect s_default nor s_false but it will affect s_true. One can speed up considerably using the borrow=True
flag when the variables that are passed to the shared variable are very large. However, the **borrow=True**
flag only works under CPUs as the device.**
Using GPU:
This aliasing between s_true
and np_array
cannot occur under GPUs because Theano manages its own memory space and hence does not have the internal representation of np_array
(check the validity of this argument please).
Where we can harvest speed using a GPU is using the borrow=True
flag in Theano.function
:
x = theano.tensor('x') y = theano.exp(x) f = theano.function([theano.In(x, borrow=True)], theano.Out(y, borrow=True))
Here, Theano does not create a new temporary storage for x but reuses the input as buffer. That is, x would be overwritten if the function would mess around with the input before returning the output. The same is true for the output. Theano reuses the output as buffer whenever the function is called. The speed advantage because immerdiately apparent when we have a long for loop which recalculates f for different inputs (makes huge difference).
This is it for now, please correct if I misunderstood something and read http://deeplearning.net/software/theano/tutorial/aliasing.html#borrowing-when-creating-shared-variables
Document for the group whats required (CUDA etc..?) for running Theano with GPU.