First of all, thanks for making this awesome work public.
I don't understand what is the meaning of the accumulation steps.
Why would the loss be backpropagated every n steps for the 1d 2d and bbox errors?
Did this come from empirical testing?
I am not sure this is a correct approach as well, regarding the software side, as the parameters will most luckily change by then from the joint loss backprop, evident from the runtime errors in torch>1.4.0.
First of all, thanks for making this awesome work public.
I don't understand what is the meaning of the accumulation steps. Why would the loss be backpropagated every n steps for the 1d 2d and bbox errors? Did this come from empirical testing?
I am not sure this is a correct approach as well, regarding the software side, as the parameters will most luckily change by then from the joint loss backprop, evident from the runtime errors in torch>1.4.0.