about trainable_params - Githubissues

ohhhyeahhh / SiamCAR

SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking (CVPR 2020, Oral)

https://openaccess.thecvf.com/content_CVPR_2020/html/Guo_SiamCAR_Siamese_Fully_Convolutional_Classification_and_Regression_for_Visual_Tracking_CVPR_2020_paper.html

322 stars 63 forks source link

about trainable_params #18

Open kongbia opened 4 years ago

kongbia commented 4 years ago

ModelBuilder中有一个额外的self.down = nn.ConvTranspose2d(256 * 3, 256, 1, 1)，但是在train.py里添加优化器的优化参数是好像并没有把这一块加进去

trainable_params = [] trainable_params += [{'params': filter(lambda x: x.requires_grad, model.backbone.parameters()), 'lr': cfg.BACKBONE.LAYERS_LR * cfg.TRAIN.BASE_LR}]

if cfg.ADJUST.ADJUST:
    trainable_params += [{'params': model.neck.parameters(),
                          'lr': cfg.TRAIN.BASE_LR}]

trainable_params += [{'params': model.car_head.parameters(),
                      'lr': cfg.TRAIN.BASE_LR}]

optimizer = torch.optim.SGD(trainable_params,
                            momentum=cfg.TRAIN.MOMENTUM,
                            weight_decay=cfg.TRAIN.WEIGHT_DECAY)

twotwo2 commented 4 years ago

非常感谢您指出了我们的问题，我们确实忘记self.down的参数放入优化器中了。意外的是，我们的跟踪器也取得了不错的精度，后续我们将改正这个问题，再次感谢您的指正。

davinca commented 4 years ago

@twotwo2 请问把self.down加入优化器的实验结果有下降吗？

Scr-w commented 3 years ago

您好，我发现我训练好的权重的优化器参数这块 trainable_params += [{'params': filter(lambda x: x.requires_grad, model.backbone.parameters()), 'lr': cfg.BACKBONE.LAYERS_LR * cfg.TRAIN.BASE_LR}] params是空的，请问可能是什么原因呢，非常感谢~期待您的回复

Scr-w commented 3 years ago

你好，请问这块else那里的requires_grad参数是不是应该也为True呢，期待您的回复，非常感谢 if current_epoch >= cfg.BACKBONE.TRAIN_EPOCH:
for layer in cfg.BACKBONE.TRAIN_LAYERS: for param in getattr(model.backbone, layer).parameters(): param.requires_grad = True for m in getattr(model.backbone, layer).modules(): if isinstance(m, nn.BatchNorm2d): m.train() else: for param in model.backbone.parameters(): param.requires_grad = False #False for m in model.backbone.modules(): if isinstance(m, nn.BatchNorm2d): m.eval()

kongbia commented 3 years ago

您好，我发现我训练好的权重的优化器参数这块 trainable_params += [{'params': filter(lambda x: x.requires_grad, model.backbone.parameters()), 'lr': cfg.BACKBONE.LAYERS_LR * cfg.TRAIN.BASE_LR}] params是空的，请问可能是什么原因呢，非常感谢~期待您的回复

训练的前10轮是不训练backbone的，所以那里参数是空的，10轮之后才开始backbone参数

Scr-w commented 3 years ago

您好，非常感谢您的回复，再向您请教一下，网络在训练10个epoch之后，会out of memory，然后是应该将checkpoint_e10.pth作为预训练模型（而不是放在RESUME的位置）并且config.yaml的START_EPOCH参数设置为11，继续训练吗？期待您的回复。

------------------ 原始邮件 ------------------ 发件人: "ohhhyeahhh/SiamCAR" <notifications@github.com>; 发送时间: 2021年1月17日(星期天) 下午4:31 收件人: "ohhhyeahhh/SiamCAR"<SiamCAR@noreply.github.com>; 抄送: "豆蔻年华"<512979854@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [ohhhyeahhh/SiamCAR] about trainable_params (#18)

您好，我发现我训练好的权重的优化器参数这块 trainable_params += [{'params': filter(lambda x: x.requires_grad, model.backbone.parameters()), 'lr': cfg.BACKBONE.LAYERS_LR * cfg.TRAIN.BASE_LR}] params是空的，请问可能是什么原因呢，非常感谢~期待您的回复

训练的前10轮是不训练backbone的，所以那里参数是空的，10轮之后才开始backbone参数

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Scr-w commented 3 years ago

您好，我发现，如果在第10个因为out of memory断了之后，如果在RESUME那里接着checkpoint_e10.pth训练的话，由于checkpoint_e10.pth的params是空的，会报错loaded state dict has a different number of parameter groups，期待您的回复

Scr-w commented 3 years ago

不好意思，接上一条，会报错ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

kongbia commented 3 years ago

这个out of memory是一个bug，从最早的pysot就有了，可以修改了batch后从上一个epoch进行resume

Scr-w commented 3 years ago

原来是这样，明白了，非常感谢您的回复~

------------------ 原始邮件 ------------------ 发件人: "kongbia"<notifications@github.com>; 发送时间: 2021年1月17日(星期天) 下午4:57 收件人: "ohhhyeahhh/SiamCAR"<SiamCAR@noreply.github.com>; 抄送: "豆蔻年华"<512979854@qq.com>; "Comment"<comment@noreply.github.com>; 主题: Re: [ohhhyeahhh/SiamCAR] about trainable_params (#18)

这个out of memory是一个bug，从最早的pysot就有了，可以修改了batch后从上一个epoch进行resume

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Scr-w commented 3 years ago

您好，我发现如果接着第11个epoch继续训练，会报错ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group 。请问您有遇到过这种情况嘛~期待您的回复

Scr-w commented 3 years ago

您好，我用SiamCAR-GOT.pth和 SiamCAR_LaSOT.pth 这两个权重测试会出现size mismatch for down.weight: copying a param with shape torch.Size([256, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([768, 256, 1, 1]).这样的报错，但是model_general.pth可以正常测试，请问您有遇到过这种情况嘛~期待您的回复，非常感谢

Scr-w commented 3 years ago

您好，非常感谢您的回复，请问在训练的过程中，前10个epoch和后10个epoch的batch_size大小需要保持一致吗，我在训练的过程中，为了保证显卡的利用率，就将前10个epoch和后10个epoch的batch_size设置的不一样，然后发现我复现出来的网络性能和论文中差距很大，期待您的回复，感谢