hello2all / GTSRB_Keras_STN

German Traffic Sign Recognition Benchmark, Keras implementation with Spatial Transformer Networks
MIT License
49 stars 15 forks source link

Spatial Transformer not giving output #6

Closed mesakarghm closed 3 years ago

mesakarghm commented 3 years ago

I tried incorporating your implementation of Spatial Transformer Network into my License Plate Recognition model. I put the STN layer immediately after the input layer but it just converts the complete image into plain black color. stn_out

I did change the keras function to tf.keras, is that what is causing the problem or something else? Any help would be highly appreciated.

Tensorflow Version -: 1.15.2

Here is my locnet definition:

def locnet(self):
    b = np.zeros((2, 3), dtype='float32')
    b[0, 0] = 1
    b[1, 1] = 1
    W = np.zeros((64, 6), dtype='float32')
    weights = [W, b.flatten()]
    locnet = Sequential()
    locnet.add(Conv2D(16, (7, 7), padding='valid'))
    locnet.add(MaxPool2D(pool_size=(2, 2)))
    locnet.add(Conv2D(32, (5, 5), padding='valid'))
    locnet.add(MaxPool2D(pool_size=(2, 2)))
    locnet.add(Conv2D(64, (3, 3), padding='valid'))
    locnet.add(MaxPool2D(pool_size=(2, 2)))
    locnet.add(Flatten())
    locnet.add(Dense(128))
    locnet.add(Activation('relu'))
    locnet.add(Dense(64))
    locnet.add(Activation('relu'))
    locnet.add(Dense(6, weights=weights))
    return locnet

Then I just add the STN layer after the input layer like:

def _build(self):
    inputs = Input(self.input_shape)
    stn = SpatialTransformer(localization_net = self.locnet(), output_size = (24,94))(inputs)

Edit: I found out the problem, but don't really know how to fix this. In the interpolate function withing Spatial Transformer, while calculating area_a, area_b, area_c and area_d my values are setting up like: area_a = - area_b and area_c = - area_d. If anyone's got any idea why this is happening or how to fix this, it'd be really helpful.

mesakarghm commented 3 years ago

Turns out, My locnet was predicting extreme higher values which was being clipped into the max_h, max_w in the interpolate function. I solved it using a sigmoid activation instead of relu in the locnet definition.