speed up NanGuardMode on GPU and move to Theano.

lisa-lab / pylearn2

Warning: This project does not have any current developer. See bellow.

BSD 3-Clause "New" or "Revised" License

2.75k stars 1.09k forks source link

speed up NanGuardMode on GPU and move to Theano. #1172

Closed nouiz closed 9 years ago

nouiz commented 10 years ago

This can be done as in gh-1054. Do the reduction on the GPU, then this will transfer much less data.

The CudaNdarray object do not support many reduction, but we can compile a Theano function that take a gpu object, do the reduction and return the result on the CPU to inspect it.

hantek commented 9 years ago

I think this ticket is to change something like "np.any(np.isnan(X))" to "np.isnan(np.min(X))", thus to speed up right?

After tracing the method that NanGuardMode uses, I find that it is already using the faster way accordingly, in file /pylearn2/utils/general.py .

lamblin commented 9 years ago

Actually, this ticket is about not performing NumPy operations (such as np.min and np.isnan) when X is on GPU (is a CudaNdarray, not an ndarray), because in that case, the whole array is first copied on CPU memory as an ndarray, and then np.min and np.isnan are called. What we want is that at least the min operation runs directly on GPU, and then we can transfer only one number to the CPU.

dwf commented 9 years ago

@hantek Any update on this?

hantek commented 9 years ago

Hi David

Can we find some codes using the NanGuardMode class? I am still not sure how to use that mode in a sample computational graph. It seems that the constructor doesn't need information about the nodes, but in the init() function it needs "node" and "fn" to be the parameter for nan_check().

And, what is a "thunk" in theano?

dwf commented 9 years ago

http://deeplearning.net/software/theano/extending/pipeline.html#step-3-execute-linker-to-obtain-a-thunk

You can try this out by compiling a theano function with mode=NanGuardMode(). Just create a function that intentionally gets NaNs somehow.

On Wed, Apr 8, 2015 at 1:05 AM, Zhouhan LIN notifications@github.com wrote:

Hi David

Can we find some codes using the NanGuardMode class? It seems that the constructor doesn't need information about the nodes, but in the init() function it needs "node" and "fn" to be the parameter for nan_check().

And, what is a "thunk" in theano?

— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/1172#issuecomment-90802731.

hantek commented 9 years ago

OK. Thanks a lot!

hantek commented 9 years ago

Could you tell me which part of code transfers the data on GPU onto CPU? I looked at the inputs and outputs property of fn, but they are already numpy.ndarray objects. and from other properties of fn I didn't find one with CudaNdarray format. Where can I find the CudaNdarray?

hantek commented 9 years ago

I have found one with a more complex computation graph.

hantek commented 9 years ago

in the nan_check() function, node.out is a CudaNdarraySharedVariable, which can be used to compute a theano function, and elements in fn.inputs (or fn.outputs) can be either numpy.ndarray or CudaNdarray.

If I use node.out to compile the theano function, then the compiled function is requiring inputs, which would actually cause data transfer between CPU and GPU. So, is there a way to utilize elements in fn.outputs directly? I find that it has values already and it is stored on GPU.

hantek commented 9 years ago

Hi Just to note that I kind have know how to do it after talking with Pascal, so there will be a PR shortly.