New classes, but overfitting

ydzat commented 1 year ago

I originally had 18 classes and used scannet_frames_25k as the training set and used Adam's learning rate decay strategy(#289) to obtain a model which was used as a pre-trained network. Learning Decay:

    def scheduler(self, epoch):
        if epoch > 50 &  epoch <= 80:
            K.set_value(self.keras_model.optimizer.lr, 0.001)
        if epoch > 80:
            K.set_value(self.keras_model.optimizer.lr, 0.0001)
        return K.get_value(self.keras_model.optimizer.lr)

The hyperparameters used for training were:

BATCH_SIZE = 8
NUM_CLASSES = 1 + 18
STEPS_PER_EPOCH = 2000

and train() as follows:

        print("Training network heads")
        model.train(dataset_train, dataset_val,
                    learning_rate=config.LEARNING_RATE,
                    epochs=60,    # default = 40
                    layers='heads',
                    augmentation=augmentation)
        # Training - Stage 2
        # Finetune layers from ResNet stage 4 and up
        print("Fine tune Resnet stage 4 and up")
        model.train(dataset_train, dataset_val,
                    learning_rate=config.LEARNING_RATE,
                    epochs=120,
                    layers='4+',
                    augmentation=augmentation)

        # Training - Stage 3
        # Fine tune all layers
        print("Fine tune all layers")
        model.train(dataset_train, dataset_val,
                    learning_rate=config.LEARNING_RATE / 10,
                    epochs=160,
                    layers='all',
                    augmentation=augmentation)

With the above code, I got a pre-trained network mask_rcnn_sc.h5.

Now I use the following code for adding 8 new classes: train dataset: 105 images val dataset: 21 images The images in the dataset contain some old classes. To ensure that the categories in annotations.json were consistent, I manually added 8 new categories to the categories section of the json based on the contents of that section of annotations.json for the pre-trained network. Learning Decay 1:

    def scheduler(self, epoch):
        if epoch <= 5:
            K.set_value(self.keras_model.optimizer.lr, 0.0001)
        if epoch > 5 & epoch <=50:
            K.set_value(self.keras_model.optimizer.lr, 0.0001)
        return K.get_value(self.keras_model.optimizer.lr)

Learning Decay 2:

    def scheduler(self, epoch):
        if epoch <= 5:
            K.set_value(self.keras_model.optimizer.lr, 0.000001 + (epoch - 1) * 0.01249975)
        if epoch > 5 & epoch <=50:
            K.set_value(self.keras_model.optimizer.lr, 0.05 - (epoch - 5) * 0.001225)
        return K.get_value(self.keras_model.optimizer.lr)

Hyperparameters:

BATCH_SIZE = 8
NUM_CLASSES = 1 + 18 + 8
STEPS_PER_EPOCH = 13

train():

        model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=10,    # default = 40
            layers='head')

The new model obtained by training is mask_rcnn_scplus.h5

When inferences were made on the new data, they were able to meet expectations, however when inferences were made on the old data, catastrophic forgetting occurred.

I don't understand where the problem is, please help me!

thanhhung0112 commented 1 year ago

do you train in colab?

ydzat commented 1 year ago

No, I train on a computer in my lab (Linux + Jupiter hub)

matterport / Mask_RCNN

New classes, but overfitting #2938