Hi! Thanks for providing this resource! I think I found a slight error in one of your "goodies."
In your implementation of the max norm constraint, you take the norm across dimension 0 of your tensor, i.e. taking the norm of each column.
In the original paper that introduces the max norm constraint, the authors describe max norm as "constraining the norm of the incoming weight vector at each hidden unit to be upper bounded by a fixed constant c (Srivastava et al 2014).
Therefore, if layer $L$ has $n$ hidden units, each with a $k$ inputs, we want to take $n$ norms of $$k$$-dimensional weight vectors. The weight parameter for linear hidden units is stored as a two dimensional tensor (out_features x in_features). In terms of the above variables, this is an $n x k$ tensor; therefore, we want to take the norm of each row. To do this in pytorch, we need to take the norm across dimension 1.
Hi! Thanks for providing this resource! I think I found a slight error in one of your "goodies."
In your implementation of the max norm constraint, you take the norm across dimension 0 of your tensor, i.e. taking the norm of each column.
In the original paper that introduces the max norm constraint, the authors describe max norm as "constraining the norm of the incoming weight vector at each hidden unit to be upper bounded by a fixed constant c (Srivastava et al 2014).
Therefore, if layer $L$ has $n$ hidden units, each with a $k$ inputs, we want to take $n$ norms of $$k$$-dimensional weight vectors. The weight parameter for linear hidden units is stored as a two dimensional tensor (out_features x in_features). In terms of the above variables, this is an $n x k$ tensor; therefore, we want to take the norm of each row. To do this in pytorch, we need to take the norm across dimension 1.