Open lorinczszabolcs opened 2 years ago
Yes I think so.Just like what you've said,it's about number of optimize_iter_steps
.But for epoch based method like EpochBasedRunner
in mmcv,I don't think more epochs need to be added,since in one epochs samples_per_gpu=4
will update 4x more than samples_per_gpu=16
settings.
Hope it helps.
I see, thank you for the quick help.
Maybe a warning / note could be added about this property, since I first assumed that by setting samples_per_gpu=4
and cumulative_iters=4
it would essentially be equal to just having samples_per_gpu=16
, even though the docstrings says "almost equals", it was giving the impression that apart from issues caused by using grad accumulation together with BN, it would be the same. Alternatively, the implementation could be changed to actually result in equivalent trainings.
Checklist
Hi!
Let's say there is a model that runs with
samples_per_gpu=16
for 40k iterations. In case if the model does not fit in memory, one would use gradient accumulation:samples_per_gpu=4
andcumulative_iters=4
. Is it needed to run this version for 160k iterations (4x the original) if the same number of optimization steps need to be taken as in the original case? My current understanding is that if we run only for 40k iterations withcumulative_iters=4
one would end up with only 40k/4=10k optimization steps with effective batch size of 16, which is not the same as having 40k optimization steps with the original 16 batch size.Thanks for your help in advance!
All the best, Szabi