Closed offchan42 closed 5 years ago
The implementation you mentioned is probably faster, since it does the decoding with TensorFlow and not with NumPy. It should also work to use the decoder layer in my implementation.
In your case, I would recommend a custom architecture. Separable convolution is a good starting point. Also reducing the depth of the architecture should make it faster, since depth can not be parallelized. Reducing the number of features should also be taken in consideration. Both may affect performance and require training from scratch.
Which implementation do you use is up to you...
It seems that SSD7 from the other repo is lightweight enough that I can configure it to fit my needs. The only concern left is to test speed on mobile.
I was curious and implemented the SSD7 model from keras_ssd7.py
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, MaxPooling2D
from ssd_model import multibox_head
def SSD7(input_shape=(128, 128, 3), num_classes=2, softmax=True):
source_layers = []
x = input_tensor = Input(shape=input_shape)
x = Conv2D(32, 5, strides=1, padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization(axis=3, momentum=0.99)(x)
x = Activation('elu')(x)
source_layers.append(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(48, 3, strides=1, padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization(axis=3, momentum=0.99)(x)
x = Activation('elu')(x)
source_layers.append(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(64, 3, strides=1, padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization(axis=3, momentum=0.99)(x)
x = Activation('elu')(x)
source_layers.append(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(64, 3, strides=1, padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization(axis=3, momentum=0.99)(x)
x = Activation('elu')(x)
source_layers.append(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(48, 3, strides=1, padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization(axis=3, momentum=0.99)(x)
x = Activation('elu')(x)
source_layers.append(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(48, 3, strides=1, padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization(axis=3, momentum=0.99)(x)
x = Activation('elu')(x)
source_layers.append(x)
x = MaxPooling2D(pool_size=2)(x)
x = Conv2D(32, 3, strides=1, padding='same', kernel_initializer='he_normal')(x)
x = BatchNormalization(axis=3, momentum=0.99)(x)
x = Activation('elu')(x)
source_layers.append(x)
num_priors = [3] * 7
normalizations = [-1] * 7
output_tensor = multibox_head(source_layers, num_priors, num_classes, normalizations, softmax)
model = Model(input_tensor, output_tensor)
model.num_classes = num_classes
# parameters for prior boxes
model.image_size = input_shape[:2]
model.source_layers = source_layers
model.aspect_ratios = [[1,2,1/2]] * 7
return model
That's all, relu and depthwise conv should be cheaper, regularisation is done in in training notebook... If you want, add the layer names.
For me, I want to train a very lightweight/fast object detection model for recognizing a single solid object e.g. a play station joystick. I tried transfer learning on tensorflow object detection API with SSDLiteMobileNetV2 but it's not fast enough because it was made to be big so that it can predict multiple classes. But I want to predict only one class which is a rigid object that won't deform or change shape at all.
That's why I'm thinking of defining MobileNetV2 to be a bit smaller and training SSD from scratch (as I think it's not possible to reuse the weights from the bigger model) so that I could achieve faster inference on a mobile phone. And maybe later I will convert the model to TF Lite. For example, I want my model to run fast like this paper: https://arxiv.org/abs/1907.05047