tdeboissiere / DeepLearningImplementations

Implementation of recent Deep Learning papers
MIT License
1.81k stars 650 forks source link

tensorNd descriptor: CUDNN_STATUS_BAD_PARAMdim=4 #8

Closed Nqabz closed 7 years ago

Nqabz commented 7 years ago

When trying your VGG_deconv.py example, I am running into the following runtime error:

RuntimeError: Could not set tensorNd descriptor: CUDNN_STATUS_BAD_PARAMdim=4 Apply node that caused the error: GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='valid', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 18
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D), <theano.gof.type.CDataType object at 0x7fa76f598748>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(0, 3, 226, 226), (64, 3, 3, 3), (0, 64, 224, 224), 'No shapes', (), ()]
Inputs strides: [(153228, 51076, 226, 1), (27, 9, 3, 1), (3211264, 50176, 224, 1), 'No strides', (), ()]
Inputs values: [b'CudaNdarray([])', 'not shown', b'CudaNdarray([])', <capsule object NULL at 0x7fa76bbeeb70>, 1.0, 0.0]
Inputs name: ('image', 'kernel', 'output', 'descriptor', 'alpha', 'beta')

Any idea why there should be a need to set the tensorNd descriptor given that I am using images?? Is there a workaround or patch you did to bypass this?

tdeboissiere commented 7 years ago

It's been a while since I last used the code which may not work anymore. Let me check. Also can you send me a minimal working example of how you applied the VGG_deconv script ?

Nqabz commented 7 years ago

Thanks for a prompt response. My first attempt was to replicate your result with same images you posted with the code. That's when the the error occurred.

I am using Keras + Theano 0.8.2

My end goal however is to use your deconvnet code on my own dcnn model trained using 1-channel grayscale images.

It will be great to know if you were able to get the same example to work.

Many thanks.

Nqabz

On Wed, Nov 23, 2016 at 4:14 PM, Thibault de Boissiere < notifications@github.com> wrote:

It's been a while since I last used the code which may not work anymore. Let me check. Also can you send me a minimal working example of how you applied the VGG_deconv script ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tdeboissiere/DeepLearningImplementations/issues/8#issuecomment-262631492, or mute the thread https://github.com/notifications/unsubscribe-auth/AP_297lkM6fJ4jULLIUwR_Hw97nwmziwks5rBKy3gaJpZM4K6nni .

tdeboissiere commented 7 years ago

I do have the same issue. Looking into it.

Nqabz commented 7 years ago

Thanks for the update. Given that you were able to generate the result with the same code, could this issue be due to updates on Theano modules?

There seem to be a related issue that's open here : https://github.com/Theano/Theano/pull/3716

On Thu, Nov 24, 2016 at 6:29 PM, Thibault de Boissiere < notifications@github.com> wrote:

I do have the same issue. Looking into it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tdeboissiere/DeepLearningImplementations/issues/8#issuecomment-262855678, or mute the thread https://github.com/notifications/unsubscribe-auth/AP_290n95-VO0Snfw3l7AbPAbogaWUU_ks5rBh3DgaJpZM4K6nni .

tdeboissiere commented 7 years ago

None of that actually although the issue had a pointer to what went wrong:

I removed some of the images in the Data folder to free up some space on my GitHub. There are only 6 images left but the code was expecting more than 32. I have just pushed an ugly hack to make it work and will update the repo later on.

 list_img = glob.glob("./Data/Img/*.jpg*")
 assert len(list_img) > 0, "Put some images in the ./Data/Img folder"
 if len(list_img) < 32:
        list_img = (int(32 / len(list_img)) + 2) * list_img
        list_img = list_img[:32]
Nqabz commented 7 years ago

Great that did the trick with your example. Much appreciated. Perhaps a last question towards generalizing your code to grayscale images: Any idea why I would be getting the following error during a backward pass from target layer = convolution2d_4 (this is now using grayscale images with my own cnn below not vgg). My input data folder contains 32 grayscale images of size 144 x 144.

Model

`model` = Sequential()
model.add(Convolution2D(96, 3, 3, border_mode='valid', init='normal', input_shape=(1, 144, 144),
                                           activation='relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(128, 3, 3, activation='relu', init='normal'))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(256, 3, 3, activation='relu', init='normal'))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(256, 3, 3, activation='relu', init='normal'))

model.add(MaxPooling2D(pool_size=(2, 2)))`

***Traceback Error**

KerasDeconv.py", line 55, in _deconv
  X[d_switch[lname]] = 0 
IndexError: index 64 is out of bounds for axis 2 with size 64

Error from Backward_Pass:

 def _backward_pass(self, X, target_layer, d_switch, feat_map):
   ....
    for lname in self.lnames[:layer_index][::-1]:
        print ("Deconvolving %s..." % lname)
        # Unpool, Deconv or do nothing
        if "maxpooling2d" in lname:
            p1, p2 = self[lname].pool_size
            uppool = K.function(
                [self.x], K.resize_images(self.x, p1, p2, "th"))
            X_outl = uppool([X_outl])

        elif "convolution2d" in lname:
            *X_outl = self._deconv(X_outl, lname, d_switch)*
        elif "padding" in lname:
            pass
        else:
            raise ValueError(
                "Invalid layer name: %s \n Can only handle maxpool and conv" % lname)
    return X_outl 
Nqabz commented 7 years ago

I seem to have bypassed the error I was getting in the backward pass (seemed be due to returned maxpooling2d "X_outl". I swapped the lines for deepest to shallowest as follows. Still looking at the result to see if there is any insight at all.

     #Iterate over layers (deepest to shallowest)

    `` for lname in self.lnames[:layer_index][::-1]:
          print ("Deconvolving %s... innner loop" % lname)

         if "maxpooling2d" in lname:
            X_maxunp = K.pool.max_pool_2d_same_size(
            self[lname].input, self[lname].pool_size)
            unpool_func = K.function([self[self.lnames[0]].input], X_maxunp)
            X_outl = unpool_func([X])

            #p1, p2 = self[lname].pool_size
            #print("p1 and p2 are:",(p1,p2))
            #uppool = K.function(
            #    [self.x], K.resize_images(self.x, p1, p2, "th"))
            #X_outl = uppool([X_outl])

        elif "convolution2d" in lname:
            X_outl = self._deconv(X_outl, lname, d_switch)
        elif "padding" in lname:
            pass
        else:
            raise ValueError(
                "Invalid layer name: %s \n Can only handle maxpool and conv" % lname)
tdeboissiere commented 7 years ago

I guess you're on your own here since your error is mostly network specific. From what I see, it should be a matter of number of filters (e.g. VGG16 assumes 64, 128, 256, 512 filters and your network may not have that many causing an index error somewhere)

Nqabz commented 7 years ago

Indeed I had to adapt both the backward pass and the forward pass to my network apart from other parts of the code that was customized for VGG16.

I am closing the issue.