lihong2303 / AGM

[ICCV2023] The repo for "Boosting Multi-modal Model Performance with Adaptive Gradient Modulation".
MIT License
24 stars 4 forks source link

Grad Clipping #6

Open kkontras opened 1 year ago

kkontras commented 1 year ago

Hi,

I noticed that you have a custom gradient clipping enabling process, based on the gradients of the weights of the perpendicular layer. Could you motivate this decision?

Furthermore, have you used that also with OGM experiments? It seems to be a keypoint in AGM according to my experiments and I am curious what its contribution is, let me know if you have some clear view.

Thanks!

lihong2303 commented 1 year ago

The gradient clip is used to avoid gradient explosion. In our experiment, OGM-GE also use same gradint clip. More information can be found in the paper.

kkontras commented 1 year ago

Hi,

Thank you for the fast response, could you as well motivate the decision for cheching the gradients of the parameters of the last layer?

kkontras commented 1 year ago

Hi, I didnt find any reference to gradient clipping in the paper, could you indicated where is it?