Why do we need to zero out value with respect to L1 regularization?

wenwei202 / iss-rnns

Sparse Recurrent Neural Networks -- Pruning Connections and Hidden Sizes (TensorFlow)

Apache License 2.0

73 stars 21 forks source link

Why do we need to zero out value with respect to L1 regularization? #5

Closed RyanTang1 closed 6 years ago

RyanTang1 commented 6 years ago

Hello, I'm trying to reproduce L1 regularization based on your implementation. But I found that you actually make a threshold even with L1 regularization. Doesn't L1 produce sparse matrix even without threshold? I do found out that if I didn't add a threshold to my L1 regularization, I did not produce a sparse matrix. So I'm wondering if L1 regularization actually can't produce sparse matrix without threshold.

wenwei202 commented 6 years ago

It can push them very close to zeros, but cannot exactly fix them as zeros because of the always-present fluctuating weight updates (from the gradients of -1 or 1). I suspect L1 norm without thresholding can push lots of weights almost zeros, you may remove small weights by thresholding after training, and it may not affect accuracy.