Initializing with random weights for training from scratch

matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Other

24.65k stars 11.7k forks source link

Initializing with random weights for training from scratch #377

Open ha5463 opened 6 years ago

ha5463 commented 6 years ago

We would like if anyone can guide us on how to initialize weights for training from scratch. We are planning to replace the Conv2D layers with SeparableConv2D layers so we cant use the previous ".h5" file for current purpose.

ThankYou

JonathanCMitchell commented 6 years ago

@ha5463 You should use a mobilenet base, and initialize the mobilenet with imagenet weights. They are freely available. Mobilenet uses SeparableConv2D layers.

ha5463 commented 6 years ago

@JonathanCMitchell Thankyou for the reply, since we do not have mobilenet base here, if you know any such repository, it would be great if you can direct to one such repo.

JonathanCMitchell commented 6 years ago

@ha5463 It is available in keras.applications here

ha5463 commented 6 years ago

@JonathanCMitchell , as you said I looked into MobileNet, though the network is quite good, we still need ResNET using SeparableConv2D. On replacing some layers in model.py (in this repo itself), I got an error Layer #18 (named "res2b_branch2a") expects 3 weight(s), but the saved weights have 2 element(s). Can you please help me with this?

For reference the complete log is as follows: ValueError Traceback (most recent call last)

in () 3 4 if init_with == "imagenet": ----> 5 model.load_weights(model.get_imagenet_weights(), by_name=True) 6 elif init_with == "coco": 7 # Load weights trained on MS COCO, but skip layers that ~/Mask_RCNN/model.py in load_weights(self, filepath, by_name, exclude) 2034 2035 if by_name: -> 2036 topology.load_weights_from_hdf5_group_by_name(f, layers) 2037 else: 2038 topology.load_weights_from_hdf5_group(f, layers) ~/venv/lib/python3.6/site-packages/keras/engine/topology.py in load_weights_from_hdf5_group_by_name(f, layers, skip_mismatch, reshape) 3451 ' weight(s), but the saved weights' + 3452 ' have ' + str(len(weight_values)) + -> 3453 ' element(s).') 3454 # Set values. 3455 for i in range(len(weight_values)): ValueError: Layer #18 (named "res2b_branch2a") expects 3 weight(s), but the saved weights have 2 element(s).

waleedka commented 6 years ago

If I understand correctly, you're trying to use the same ResNet50 network, but change the regular convolutions to separable convolutions. I don't think you can use the provided weights for this case. Separable convolutions have a different structure, so the weights from the regular convolutions won't be that useful anyway. I think your best option is to train training from scratch.

ha5463 commented 6 years ago

@waleedka sir, thank you for understanding and pointing out the exact problem which we are trying to come over, which is to train the new model ( ResNet50 with SeparableConv2D layers) with random weights. We are not able to initialize this new model using random weights and start training from scratch, if you know any resource which we can refer to for coming over the problem, it would be very helpful.

ThankYou

JonathanCMitchell commented 6 years ago

@ha5463 Did you comment out the line L470 that has model.load_weights. I would modify it to become model.load_weights(<path>) instead of model.load_weights(<path>, byName=True) because the byName argument will fail when you try to load custom layers. You should also keep track of the actual layer names in layer_regex inside model.py. Your SeparableConv2D layers should try to meet the same layer naming pattern as resnet but to be sure they are loading put a breakpoint before the fit_generator loads and check the trainable property on your layers.

You should also use a trainable BatchNormalization layer instead of the defined BatchNorm layer that overwrites keras.layers.BatchNormalization layers.

waleedka commented 6 years ago

You should also use a trainable BatchNormalization layer instead of the defined BatchNorm layer that overwrites keras.layers.BatchNormalization layers.

As of the latest update (a day or two ago), there is a config setting, TRAIN_BN, that makes it easy to enable/disable batch normalization training without updating the code.

@ha5463 If you want to start with random weights then simple comment out the line that loads the weights (@JonathanCMitchell links to it in the comment above). By default, if you don't load any weights, then you're starting with random weights.

CMCDragonkai commented 6 years ago

@waleedka Is calling an instance of BatchNorm with training=False the same as constructing KL.BatchNormalization with thetrainable=False parameter? The Keras documentation doesn't mention training=False, but does mention it for freezing layers.

If so, how do the parameters interact with each other? Does calling it with training=? always override what's it's constructed with?