Improve training speed for both the audio and vision experiments by accumulating the module-wise losses and by using a single optimizer for all modules.
Calling .detach() on the features between modules is enough to ensure that no gradients leak in between them.
Improve training speed for both the audio and vision experiments by accumulating the module-wise losses and by using a single optimizer for all modules.
Calling .detach() on the features between modules is enough to ensure that no gradients leak in between them.