microsoft / Relation-Aware-Global-Attention-Networks

We design an effective Relation-Aware Global Attention (RGA) module for CNNs to globally infer the attention.
MIT License
337 stars 65 forks source link

KeyError: 'bn1.num_batches_tracked' #4

Open xiaopanchen opened 4 years ago

xiaopanchen commented 4 years ago

这段代码statedict[i].copy(param_dict[key])出错,使用的是torchvision 0.4.0 KeyError: 'bn1.num_batches_tracked'

ZhiZZhang commented 4 years ago

这段代码statedict[i].copy(param_dict[key])出错,使用的是torchvision 0.4.0 KeyError: 'bn1.num_batches_tracked'

这个是pytorch版本的问题。用高版本的时候可以直接把这个Key跳过去,对实验结果影响不大。

andreazuna89 commented 4 years ago

Hi. I am not able to achieve your performance on CUHK (The mAP I get is around 40%). I got the same problem. There is also a similar problem for similar layers (e.g. layer1.0.bn1.num_batches_tracked , layer1.0.bn2.num_batches_tracked). Are you sure we can skip these layers without loosing performance? In the requirements you suggest to use pytorch version == 0.4.0 but it is not available and the code only works with an updated pytorch version but with the highlighted problem of the layers. Can you help to solve this? Thanks a lot

ZhiZZhang commented 4 years ago

Hi. I am not able to achieve your performance on CUHK (The mAP I get is around 40%). I got the same problem. There is also a similar problem for similar layers (e.g. layer1.0.bn1.num_batches_tracked , layer1.0.bn2.num_batches_tracked). Are you sure we can skip these layers without loosing performance? In the requirements you suggest to use pytorch version == 0.4.0 but it is not available and the code only works with an updated pytorch version but with the highlighted problem of the layers. Can you help to solve this? Thanks a lot

What I suggest is to skip the params named "*.num_batches_tracked" instead of the layers! These params are added in the new versions of Pytorch after 0.4.0. However, the pre-trained model in Pytorch official website (as attached in this repo) doesn't include them. Thus, if you have to use a high-version Pytorch, skipping them when loading the pre-trained model is the only solution I can find currently. This shoud not affect subsequent re-id training much in theory, but I'm not sure for its practical effects.

HongweiZhang97 commented 4 years ago

你好,我在使用高版本pytorch加载过程中跳过了这一参数,在cuhk03数据集上的训练结果中mAP与文中一致,但rank1低了1.4个百分点,请问是由于这方面的影响吗?

ZhiZZhang commented 4 years ago

你好,我在使用高版本pytorch加载过程中跳过了这一参数,在cuhk03数据集上的训练结果中mAP与文中一致,但rank1低了1.4个百分点,请问是由于这方面的影响吗?

有可能。另一种可能性是单卡vs.多卡训练导致。

HongweiZhang97 commented 4 years ago

      你好,感谢说明!我在多卡测试时确实观察到mAP下降问题,但与这并不是同一问题。后续我将尝试解决这一参数的影响,感谢你的帮助!

------------------ 原始邮件 ------------------ 发件人: "microsoft/Relation-Aware-Global-Attention-Networks" <notifications@github.com>; 发送时间: 2020年8月2日(星期天) 下午5:07 收件人: "microsoft/Relation-Aware-Global-Attention-Networks"<Relation-Aware-Global-Attention-Networks@noreply.github.com>; 抄送: "Hongwei Zhang"<1398936838@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [microsoft/Relation-Aware-Global-Attention-Networks] KeyError: 'bn1.num_batches_tracked' (#4)

你好,我在使用高版本pytorch加载过程中跳过了这一参数,在cuhk03数据集上的训练结果中mAP与文中一致,但rank1低了1.4个百分点,请问是由于这方面的影响吗?

有可能。另一种可能性是单卡vs.多卡训练导致。

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

sky186 commented 4 years ago

@ZhiZZhang 您好,因为我想像se 模块一样嵌入到训练任务中,对于其他的任务也可以直接有效的提升,所以对您的这几个参数,我想请问一下, opt = adam epoch = 300 batch = 64, lr_scheduler = LRScheduler(base_lr=0.0008, step=[80, 120, 160, 200, 240, 280, 320, 360], factor=0.5, warmup_epoch=20, warmup_begin_lr=0.000008) 这是您实验的最优参数吗,epoch迭代次数比较大,所以您训练多少epoch达到最好的精度昵?这两个模块加入到网络中有比较注意的训练参数有害,或者有益吗? 我用的fastreid,项目中超参数训练,rgasc网络结构直接嵌入到mgn_ibna 网络中, epoch是60,初始学习率0.0035, CosineAnnealingLR(不是step), 目前结果明显是不好的,dukemtmc数据, 只有map 71%,

您的实验参数是有什么理由吗? 还是直接根据经验调整的吗? 对于rgasc 模块嵌入后, 其他模型训练也会有自己的最优参数,两者通常会有出入,

jpainam commented 4 years ago

Hi, i used the exact packages you defined your requirements.txt and yet, i couldn't find bn1.num_batches_tracked in the loaded weights. I decided to skipp the parameters as you said, but then the results is really far from the one your reported in your paper. I got

labeled cuhk03 dataset
Evaluated with "feat_" features and "cosine" metric:
Mean AP: 71.6%
CMC Scores
  top-1          77.1%
  top-5          89.5%
  top-10         94.1%
  top-20         96.4%
Evaluated with "feat" features and "cosine" metric:
Mean AP: 66.0%
CMC Scores
  top-1          69.6%
  top-5          85.6%
  top-10         91.4%
  top-20         95.1%

While, in your paper, you report top-1: 81.1 and mAP: 77.4. This is a huge gap. Can you release your checkpoints, so we can try.

PhilChina commented 3 years ago

你好,我在使用高版本pytorch加载过程中跳过了这一参数,在cuhk03数据集上的训练结果中mAP与文中一致,但rank1低了1.4个百分点,请问是由于这方面的影响吗?

你好, 请问可以分享一下这部分(使用高版本pytorch加载过程中跳过了这一参数)是如何做的吗?? 最好可以贴一下这部分的代码

jpainam commented 3 years ago

@PhilChina I think you can skip the params using this code

   def load_partial_param(self, state_dict, model_index, model_path):
        param_dict = torch.load(model_path)
        for i in state_dict:
            try:
                key = 'layer{}.'.format(model_index) + i
                state_dict[i].copy_(param_dict[key])
            except KeyError:
                continue
        del param_dict

    def load_specific_param(self, state_dict, param_name, model_path):
        param_dict = torch.load(model_path)
        for i in state_dict:
            try:
                key = param_name + '.' + i
                state_dict[i].copy_(param_dict[key])
            except KeyError:
                continue
        del param_dict

This is what i did it should work,

PhilChina commented 3 years ago

@PhilChina I think you can skip the params using this code

   def load_partial_param(self, state_dict, model_index, model_path):
        param_dict = torch.load(model_path)
        for i in state_dict:
            try:
                key = 'layer{}.'.format(model_index) + i
                state_dict[i].copy_(param_dict[key])
            except KeyError:
                continue
        del param_dict

    def load_specific_param(self, state_dict, param_name, model_path):
        param_dict = torch.load(model_path)
        for i in state_dict:
            try:
                key = param_name + '.' + i
                state_dict[i].copy_(param_dict[key])
            except KeyError:
                continue
        del param_dict

This is what i did it should work, OK, thank you very much @jpainam