Open BigworldNebula opened 1 month ago
这 差那么多嘛 感觉不太合理啊,不固定seed的话跑两遍差多少
这 差那么多嘛 感觉不太合理啊,不固定seed的话跑两遍差多少
作者大大你好,不固定有时候也能相差2个点左右,感觉固定seed没有作用一样,
我后面在固定seed函数里面增加了两行代码,有时候也能相差2个点左右,感觉跑起来很不稳定,
代码是加在main()的 dist.init_distributed() 和 cfg = YAMLConfig() 之间的
,使用的是单卡训练。
seed = seed + get_rank()
# seed init.
random.seed(seed)
np.random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
# torch seed init.
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
这 差那么多嘛 感觉不太合理啊,不固定seed的话跑两遍差多少
作者大大你好,我后面进行了测试,问题还是没有解决,不知道是不是框架的问题,
我一步步定位每一层的结果,发现 presnet.py 中 self.short 在两次输出的结果不同(都固定了seed),
进一步分析发现其实例化的 ConvNormLayer(ch_in, ch_out, 1, stride) 在没有指定padding时(即默认 padding=0)ConvNormLayer(ch_in, ch_out, 1, stride)
,这个 self.short 的两次输出结果就不同;
在指定 padding=1 后 ConvNormLayer(ch_in, ch_out, 1, stride, padding=1)
,这个 self.short 的两次输出结果相同;
或者指定 kerner_size=3, 即 ConvNormLayer(ch_in, ch_out, 3, stride) ,这个 self.short 的两次输出结果也相同;
但奇怪的是,我在单独将关于 ConvNormLayer类(common.py)的内容单独创建了一个py文件进行测试发现,无论 padding指定为1还是0,这部分的结果都是相同的(都固定了seed)。
ch_in = ch_out = 64
i = torch.randn((2,ch_in,4,5))
short = ConvNormLayer(ch_in, ch_out, 1, 1, padding=0, bias=False)
# short = ConvNormLayer(ch_in, ch_out, 1, 1, padding=1, bias=False)
ic(short(i))
这个我也不确定你这个什么情况, 我基于这个codebase跑coco是能复现的 -.-
def setup_seed(seed: int, deterministic=False):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
# memory will be large when setting deterministic to True
if torch.backends.cudnn.is_available() and deterministic:
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
这个我也不确定你这个什么情况, 我基于这个codebase跑coco是能复现的 -.-
def setup_seed(seed: int, deterministic=False): random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) # memory will be large when setting deterministic to True if torch.backends.cudnn.is_available() and deterministic: torch.backends.cudnn.benchmark = False torch.backends.cudnn.deterministic = True
你好,请问一下src/misc/dist.py里面的这段代码是用来固定随机种子的吗 def set_seed(seed):
seed = seed + get_rank()
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)
作者大大你好,我在固定随机种子后每次跑的结果不同,有时候能相差2个点左右,代码如下:
下面是分别两次训练后第37轮跑出来的结果,结果相差较大:
IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.181 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.351 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.159 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.046 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.232 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.230 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.327 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.388 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.231 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.441 best_stat: {'epoch': 33, 'coco_eval_bbox': 0.1819428970001259}
IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.169 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.328 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.146 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.067 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.206 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.199 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.292 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.374 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.177 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.423 best_stat: {'epoch': 37, 'coco_eval_bbox': 0.16941847629586607}
感谢作者大大解答!