Faster training by accumulating module-wise losses and using single optimizer

loeweX / Greedy_InfoMax

Code for the paper: Putting An End to End-to-End: Gradient-Isolated Learning of Representations

https://arxiv.org/abs/1905.11786

MIT License

284 stars 36 forks source link

Faster training by accumulating module-wise losses and using single optimizer #18

Closed loeweX closed 3 years ago

loeweX commented 3 years ago

Improve training speed for both the audio and vision experiments by accumulating the module-wise losses and by using a single optimizer for all modules.

Calling .detach() on the features between modules is enough to ensure that no gradients leak in between them.