Training process - Githubissues

Saafke commented 3 years ago

Hi He,

I had a question about the length of the training process. In the paper you mention:

In the first stage of training, we freeze the ResNet50 weights and only train the layers in the heads, the RPN and FPN for 10K iterations. In the second stage, we freeze ResNet50 layers below level 4 and train for 3K iterations. In the final stage, we freeze ResNet50 layers below level 3 for another 70K iterations. When switching to each stage, we decrease the learning rate by a factor of 10.

From this it seems you only perform 70+10+3 = 83K iterations. Which means, at a batchsize of 2, you only train on 83*2=166K images, once. This is a bit confusing to me, as your training dataset is more than that, at 275K images.

However, in the code I can see that you train for 100+130+400 = 630K iterations.

GPU_COUNT = 1
IMAGES_PER_GPU = 2

# Use a small epoch since the data is simple
STEPS_PER_EPOCH = 1000

#print("Training network heads")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=100,                                        <==========   100K
            layers_name='heads')

# Training - Stage 2
# Finetune layers from ResNet stage 4 and up
print("Training Resnet layer 4+")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE/10,
            epochs=130,                                        <==========   130K
            layers_name='4+')

# Training - Stage 3
# Finetune layers from ResNet stage 3 and up
print("Training Resnet layer 3+")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE/100,
            epochs=400,                                        <==========   400K
            layers_name='all')

So I was wondering, how many iterations did you do for the experiments in the paper, and how many would you recommend doing for good performance?

hughw19 commented 3 years ago

Please follow our code. The epoch number may be mistakenly wrong. We basically adopt the original Matterport code’s training protocol without too many customizations.

Best, He

On Thu, Nov 12, 2020 at 3:03 AM Xavier Weber notifications@github.com wrote:

Hi He,

I had a question about the length of the training process. In the paper you mention:

In the first stage of training, we freeze the ResNet50 weights and only train the layers in the heads, the RPN and FPN for 10K iterations. In the second stage, we freeze ResNet50 layers below level 4 and train for 3K iterations. In the final stage, we freeze ResNet50 layers below level 3 for another 70K iterations. When switching to each stage, we decrease the learning rate by a factor of 10.

From this it seems you only perform 70+10+3 = 83K iterations. Which means, at a batchsize of 2, you only train on 83*2=166K images, once. This is a bit confusing to me, as your training dataset is more than that, at 275K images.

However, in the code I can see that you train for 100+130+400 = 630K iterations.

GPU_COUNT = 1 IMAGES_PER_GPU = 2

Use a small epoch since the data is simple

STEPS_PER_EPOCH = 1000

print("Training network heads")

model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=100, <========== 100K layers_name='heads')

Training - Stage 2

Finetune layers from ResNet stage 4 and up

print("Training Resnet layer 4+") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE/10, epochs=130, <========== 130K layers_name='4+')

Training - Stage 3

Finetune layers from ResNet stage 3 and up

print("Training Resnet layer 3+") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE/100, epochs=400, <========== 400K layers_name='all')

So I was wondering, how many iterations did you do for the experiments in the paper, and how many would you recommend doing for good performance?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hughw19/NOCS_CVPR2019/issues/37, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEXSGH2IIXHSI6XHEU55GBTSPO6HPANCNFSM4TTEJVDQ .

WW-0 commented 3 years ago

What kind of equipment do you use for training and how much GPU memory at least

hughw19 commented 3 years ago

What kind of equipment do you use for training and how much GPU memory at least

We use Nvidia GeForce Titan Xp. 12 GB should be fine.

hughw19 / NOCS_CVPR2019

Training process #37

Use a small epoch since the data is simple

print("Training network heads")

Training - Stage 2

Finetune layers from ResNet stage 4 and up

Training - Stage 3

Finetune layers from ResNet stage 3 and up