Closed gunshi closed 4 years ago
It is computing the gradients and the dot products in the code. This should help but definitely does not include anything like the paper you mentioned. Projected gradient descent is solving the optimization problem we define in the paper, nothing more than that. The paper you cited is not doing projected gradient descent, they are doing some gradient projections but it is used in a different heuristic way. I would recommend reaching the authors of the paper you cited.
Hi, thanks for the open-source code! I'm looking to implement something like https://openreview.net/forum?id=HJewiCVFPB (Gradient Surgery for Multi-Task Learning) using/on top of this repo, but i'm a little confused about whether the projected descent method that is mentioned in the Readme is something similar already present.
Could you add citations and sources for any additional algorithms and tricks that this repo implements apart from the main paper? That would be very helpful! (or alternatively it would be helpful to just get some pointers on what/where to modify to get what i want working)
PS: ^ the paper i mentioned just involves projecting the gradients of all tasks onto the planes of the gradients of the other tasks before doing updates, to avoid gradient conflicts, (for some quick context)
Thanks