Regarding the accumulation steps!

First of all, thanks for making this awesome work public.

I don't understand what is the meaning of the accumulation steps. Why would the loss be backpropagated every n steps for the 1d 2d and bbox errors? Did this come from empirical testing?

I am not sure this is a correct approach as well, regarding the software side, as the parameters will most luckily change by then from the joint loss backprop, evident from the runtime errors in torch>1.4.0.