The generator losses can be accumulated into l_g_total and have a single scale.backward() run on them, as was done until recently. Lower-level gradients do not get overwritten, and scale only needs to be run on each loss individually if they are not summed.
The generator losses can be accumulated into l_g_total and have a single scale.backward() run on them, as was done until recently. Lower-level gradients do not get overwritten, and scale only needs to be run on each loss individually if they are not summed.