Open XA23i opened 1 year ago
Hello XA23i,
I'm not the author of this research, but I believe the reasoning behind it is related to the binarization of weights using the sign function. This process ensures that the weights maintain a specific statistical distribution throughout training. According to the authors, this final distribution of the weights follows a Laplacian pattern. Then, calculating the information entropy allows us to manipulate that distribution to achieve a higher or lower entropy.
i am wondering in your paper why use latent full precision weights to calculate information entropy rather than binarized weights? It seems make no sense considering latent weights.