Closed cjyiiiing closed 1 year ago
Hi @cjyiiiing
What I did are based on the following steps: 1). Hooking the backpropagate gradients of every layer for all iterations in an epoch in the latter training stage (e.g., 60~80). 2). Calculating the average among the iteration numbers. 3). Sorting the layers' gradient results.
I hope it helps.
Cheers, Yuyuan
Could you please provide this part of code?
Hi. How to draw the figure 5 in the paper? I'm not sure whether I understand the "average gradient magnitudes" clearly. Can you explain more about that figure? Thanks!