Open kkontras opened 1 year ago
The gradient clip is used to avoid gradient explosion. In our experiment, OGM-GE also use same gradint clip. More information can be found in the paper.
Hi,
Thank you for the fast response, could you as well motivate the decision for cheching the gradients of the parameters of the last layer?
Hi, I didnt find any reference to gradient clipping in the paper, could you indicated where is it?
Hi,
I noticed that you have a custom gradient clipping enabling process, based on the gradients of the weights of the perpendicular layer. Could you motivate this decision?
Furthermore, have you used that also with OGM experiments? It seems to be a keypoint in AGM according to my experiments and I am curious what its contribution is, let me know if you have some clear view.
Thanks!