sHi, I find that when doing sparsification on GRU_A during training process, the sparsified weights (with shape (384, 1152)) are vertical strips of 1. However, when dumping the model, the weight matrix (with shape (384,1152)) are horizontal strips of non-zero values.
Why is this happening? This might be a silly question, but it troubled me the entire day.
Thanks!
sHi, I find that when doing sparsification on GRU_A during training process, the sparsified weights (with shape (384, 1152)) are vertical strips of 1. However, when dumping the model, the weight matrix (with shape (384,1152)) are horizontal strips of non-zero values. Why is this happening? This might be a silly question, but it troubled me the entire day. Thanks!