lyuwenyu / RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Apache License 2.0
1.64k stars 178 forks source link

固定随机种子后每次跑的结果不同 #311

Open BigworldNebula opened 1 month ago

BigworldNebula commented 1 month ago

作者大大你好,我在固定随机种子后每次跑的结果不同,有时候能相差2个点左右,代码如下:

    seed = seed + get_rank()

    # seed init.
    random.seed(seed)
    np.random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)

    # torch seed init.
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

下面是分别两次训练后第37轮跑出来的结果,结果相差较大:

IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.181 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.351 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.159 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.046 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.232 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.230 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.327 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.388 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.231 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.441 best_stat: {'epoch': 33, 'coco_eval_bbox': 0.1819428970001259}

IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.169 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.328 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.146 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.067 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.206 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.199 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.292 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.374 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.177 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.423 best_stat: {'epoch': 37, 'coco_eval_bbox': 0.16941847629586607}

感谢作者大大解答!

lyuwenyu commented 1 month ago

这 差那么多嘛 感觉不太合理啊,不固定seed的话跑两遍差多少

BigworldNebula commented 1 month ago

这 差那么多嘛 感觉不太合理啊,不固定seed的话跑两遍差多少

作者大大你好,不固定有时候也能相差2个点左右,感觉固定seed没有作用一样, 我后面在固定seed函数里面增加了两行代码,有时候也能相差2个点左右,感觉跑起来很不稳定, 代码是加在main()的 dist.init_distributed() 和 cfg = YAMLConfig() 之间的,使用的是单卡训练。

    seed = seed + get_rank()
    # seed init.
    random.seed(seed)
    np.random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    # torch seed init.
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
BigworldNebula commented 1 month ago

这 差那么多嘛 感觉不太合理啊,不固定seed的话跑两遍差多少

作者大大你好,我后面进行了测试,问题还是没有解决,不知道是不是框架的问题,

我一步步定位每一层的结果,发现 presnet.py 中 self.short 在两次输出的结果不同(都固定了seed), 进一步分析发现其实例化的 ConvNormLayer(ch_in, ch_out, 1, stride) 在没有指定padding时(即默认 padding=0)ConvNormLayer(ch_in, ch_out, 1, stride),这个 self.short 的两次输出结果就不同; 在指定 padding=1 后 ConvNormLayer(ch_in, ch_out, 1, stride, padding=1),这个 self.short 的两次输出结果相同; 或者指定 kerner_size=3, 即 ConvNormLayer(ch_in, ch_out, 3, stride) ,这个 self.short 的两次输出结果也相同;

但奇怪的是,我在单独将关于 ConvNormLayer类(common.py)的内容单独创建了一个py文件进行测试发现,无论 padding指定为1还是0,这部分的结果都是相同的(都固定了seed)。

ch_in = ch_out = 64
i = torch.randn((2,ch_in,4,5))
short = ConvNormLayer(ch_in, ch_out, 1, 1, padding=0, bias=False)
# short = ConvNormLayer(ch_in, ch_out, 1, 1, padding=1, bias=False)
ic(short(i))
lyuwenyu commented 1 month ago

这个我也不确定你这个什么情况, 我基于这个codebase跑coco是能复现的 -.-

def setup_seed(seed: int, deterministic=False):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    # memory will be large when setting deterministic to True
    if torch.backends.cudnn.is_available() and deterministic:
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True
ZZQ565697818 commented 1 month ago

这个我也不确定你这个什么情况, 我基于这个codebase跑coco是能复现的 -.-

def setup_seed(seed: int, deterministic=False):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    # memory will be large when setting deterministic to True
    if torch.backends.cudnn.is_available() and deterministic:
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True

你好,请问一下src/misc/dist.py里面的这段代码是用来固定随机种子的吗 def set_seed(seed):

fix the seed for reproducibility

seed = seed + get_rank()
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)