lihong2303 / AGM

[ICCV2023] The repo for "Boosting Multi-modal Model Performance with Adaptive Gradient Modulation".
MIT License
22 stars 2 forks source link

Grad Clipping #6

Open kkontras opened 8 months ago

kkontras commented 8 months ago

Hi,

I noticed that you have a custom gradient clipping enabling process, based on the gradients of the weights of the perpendicular layer. Could you motivate this decision?

Furthermore, have you used that also with OGM experiments? It seems to be a keypoint in AGM according to my experiments and I am curious what its contribution is, let me know if you have some clear view.

Thanks!

lihong2303 commented 8 months ago

The gradient clip is used to avoid gradient explosion. In our experiment, OGM-GE also use same gradint clip. More information can be found in the paper.

kkontras commented 8 months ago

Hi,

Thank you for the fast response, could you as well motivate the decision for cheching the gradients of the parameters of the last layer?

kkontras commented 8 months ago

Hi, I didnt find any reference to gradient clipping in the paper, could you indicated where is it?