titu1994 / Keras-NASNet

"NASNet" models in Keras 2.0+ with weights
MIT License
200 stars 64 forks source link

how to load pretrain model? #15

Open zuoxiang95 opened 6 years ago

zuoxiang95 commented 6 years ago

hello @titu1994 , I am using your code to train my dataset, and i want to train it with a pretrain model that you provide in nasnet.py. But the problem is that my category is 361, and the pre-trained model is 1000, how do I modify it? Looking forward for your reply! : )

zuoxiang95 commented 6 years ago

I build a model to load the pretrain model's weight as this: model = NASNetLarge((img_rows, img_cols, img_channels), use_auxiliary_branch=True, include_top=True)

but i get this error:

Traceback (most recent call last): File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1567, in _create_c_op c_op = c_api.TF_FinishOperation(op_desc) tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 1 and 128. Shapes are [1,1,2688,128] and [128,2016,1,1]. for 'Assign_1524' (op: 'Assign') with input shapes: [1,1,2688,128], [128,2016,1,1].

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 70, in model = NASNetLarge((img_rows, img_cols, img_channels), use_auxiliary_branch=True, include_top=True) File "/home/zuoxiang/Keras-NASNet/nasnet.py", line 407, in NASNetLarge default_size=331) File "/home/zuoxiang/Keras-NASNet/nasnet.py", line 320, in NASNet model.load_weights(weights_file) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/network.py", line 1180, in load_weights f, self.layers, reshape=reshape) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/saving.py", line 929, in load_weights_from_hdf5_group K.batch_set_value(weight_value_tuples) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2430, in batch_set_value assign_op = x.assign(assign_placeholder) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 615, in assign return state_ops.assign(self._variable, value, use_locking=use_locking) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 283, in assign validate_shape=validate_shape) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 60, in assign use_locking=use_locking, name=name) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1734, in init control_input_ops) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1570, in _create_c_op raise ValueError(str(e)) ValueError: Dimension 0 in both shapes must be equal, but are 1 and 128. Shapes are [1,1,2688,128] and [128,2016,1,1]. for 'Assign_1524' (op: 'Assign') with input shapes: [1,1,2688,128], [128,2016,1,1].

Do you know what's wrong? Thank you very much!

titu1994 commented 6 years ago

You must have used an odd input shape here. Can you provide the full script with all variables ?

zuoxiang95 commented 6 years ago

here is my all variables:

    lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.5), cooldown=0, patience=5, min_lr=0.5e-5)
    csv_logger = CSVLogger('NASNet-objction-classfication.csv')
    model_checkpoint = ModelCheckpoint(weights_file, monitor='val_predictions_acc', save_best_only=True, save_weights_only=True, mode='max')
    batch_size = 128
    nb_classes = 361
    nb_epoch = 200  # should be 600
    data_augmentation = True

    # input image dimensions
    img_rows, img_cols = 331, 331
    img_channels = 3
zuoxiang95 commented 6 years ago

when i set use_auxiliary_branch=False, include_top=False and add code in my script. The model can be trained successfully. But another problem is when i can only set batch size to 16,otherwise it will OOM. My machine is P40.


    base_model = NASNetLarge((img_rows, img_cols, img_channels), use_auxiliary_branch=False, include_top=False)
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dropout(dropout)(x)
    predictions = Dense(nb_classes, activation='softmax', kernel_regularizer=l2(weight_decay), name='predictions')(x)
    model = Model(inputs=base_model.input, outputs=predictions)
    model.summary()
titu1994 commented 6 years ago

Weird. I don't get this error. I an using TF with Channs last data format. I'm guessing you are having the same, so I don't understand the cause.

zuoxiang95 commented 6 years ago

Yes, I am using the generator function in imagenet_validation.py.

zuoxiang95 commented 6 years ago

hello @titu1994 , How big is your model's batch size when you trained large nasnet?

Sbakkalii commented 4 years ago

@titu1994 Same Issue Here, I think there is a problem while loading auxiliary_brach weights..