Closed nouiz closed 9 years ago
Hi
I think this ticket is to change something like "np.any(np.isnan(X))" to "np.isnan(np.min(X))", thus to speed up right?
After tracing the method that NanGuardMode uses, I find that it is already using the faster way accordingly, in file /pylearn2/utils/general.py .
Actually, this ticket is about not performing NumPy operations (such as np.min
and np.isnan
) when X
is on GPU (is a CudaNdarray
, not an ndarray), because in that case, the whole array is first copied on CPU memory as an ndarray, and then np.min
and np.isnan
are called.
What we want is that at least the min
operation runs directly on GPU, and then we can transfer only one number to the CPU.
@hantek Any update on this?
Hi David
Can we find some codes using the NanGuardMode class? I am still not sure how to use that mode in a sample computational graph. It seems that the constructor doesn't need information about the nodes, but in the init() function it needs "node" and "fn" to be the parameter for nan_check().
And, what is a "thunk" in theano?
You can try this out by compiling a theano function with
mode=NanGuardMode()
. Just create a function that intentionally gets NaNs
somehow.
On Wed, Apr 8, 2015 at 1:05 AM, Zhouhan LIN notifications@github.com wrote:
Hi David
Can we find some codes using the NanGuardMode class? It seems that the constructor doesn't need information about the nodes, but in the init() function it needs "node" and "fn" to be the parameter for nan_check().
And, what is a "thunk" in theano?
— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/1172#issuecomment-90802731.
OK. Thanks a lot!
Could you tell me which part of code transfers the data on GPU onto CPU? I looked at the inputs and outputs property of fn, but they are already numpy.ndarray objects. and from other properties of fn I didn't find one with CudaNdarray format. Where can I find the CudaNdarray?
I have found one with a more complex computation graph.
in the nan_check() function, node.out is a CudaNdarraySharedVariable, which can be used to compute a theano function, and elements in fn.inputs (or fn.outputs) can be either numpy.ndarray or CudaNdarray.
If I use node.out to compile the theano function, then the compiled function is requiring inputs, which would actually cause data transfer between CPU and GPU. So, is there a way to utilize elements in fn.outputs directly? I find that it has values already and it is stored on GPU.
Hi Just to note that I kind have know how to do it after talking with Pascal, so there will be a PR shortly.
This can be done as in gh-1054. Do the reduction on the GPU, then this will transfer much less data.
The CudaNdarray object do not support many reduction, but we can compile a Theano function that take a gpu object, do the reduction and return the result on the CPU to inspect it.