oarriaga / STN.keras

Implementation of spatial transformer networks (STNs) in keras 2 with tensorflow as backend.
MIT License
278 stars 75 forks source link

Output of spatial transformer network is in plain black color #18

Closed mesakarghm closed 3 years ago

mesakarghm commented 3 years ago

i tried using the Spatial Transformer layer from https://github.com/johnwangMK/spatial_transformer_networks/blob/master/src/spatial_transformer.py (edited num_channels in line 137 inside def_transform from tf.shape(input_shape)[3] to 3 in my code) in my License Plate Recognition project. But the model didn't converge, I returned output of STN with the model outputs and it is plain black, nothing else.

def locnet(self):
    b = np.zeros((2, 3), dtype='float32')
    b[0, 0] = 1
    b[1, 1] = 1
    W = np.zeros((64, 6), dtype='float32')
    weights = [W, b.flatten()]
    locnet = Sequential()

    locnet.add(Conv2D(16, (7, 7), padding='valid', input_shape=(48,188, 3),kernel_initializer='glorot_normal'))
    locnet.add(MaxPool2D(pool_size=(2, 2)))
    locnet.add(Conv2D(32, (5, 5), padding='valid'))
    locnet.add(MaxPool2D(pool_size=(2, 2)))
    locnet.add(Conv2D(64, (3, 3), padding='valid'))
    locnet.add(MaxPool2D(pool_size=(2, 2)))

    locnet.add(Flatten())
    locnet.add(Dense(128))
    locnet.add(Activation('relu'))
    locnet.add(Dense(64))
    locnet.add(Activation('relu'))
    locnet.add(Dense(6, weights=weights))

    return locnet

` This is my locnet function and I call the STN as:

self.input_shape = 48,188,3
def _build(self):
    inputs = Input(self.input_shape)
    stn = SpatialTransformer(localization_net=self.locnet(),
                                 output_size=(24,94))(inputs)
    followed by licence plate recognition model which works well    

After using this, the model isn't converging and when I try to plot the output of stn as:

image = np.squeeze(stn_out)[0]
cv2.resize(image, (100,200))
cv2.imshow("frame", image)

it just gives plain black color.

Not only that, the output of the license plate model is same for every image. So, I'm guessing this definitely is because of some mistake I'm making in the Spatial Transformer but I don't know what.

Tensorflow Version - 1.15.2

Edit: I found out the problem, but don't really know how to fix this. In the interpolate function withing Spatial Transformer, while calculating area_a, area_b, area_c and area_d my values are setting up like: area_a = - area_b and area_c = - area_d. If anyone's got any idea why this is happening or how to fix this, it'd be really helpful.

mesakarghm commented 3 years ago

Changing the Activation function from ReLU to Sigmoid in the localization network did the job for me.