paulbrodersen / entropy_estimators

Estimators for the entropy and other information theoretic quantities of continuous distributions
GNU General Public License v3.0
132 stars 26 forks source link

Support for computing entropy of a Tensor #10

Closed willleeney closed 3 years ago

willleeney commented 3 years ago

I would like to be able to use this to compute the continuous entropy of a tensor so that I do not have troubles with backpropagate.

I am trying to use continuous.get_h() for this but the kdtree = cKDTree(x) throws an error as needs to be converted to numpy first.

Any advice or help would be much appreciated?

paulbrodersen commented 3 years ago

I am not sure what you are trying to accomplish; I am assuming some sort of regression optimized with backprop using the continuous entropy estimate as a cost function. If that is the case, I am not sure that is a theoretically sound approach. The Kozachenko-Leonenko estimator implemented by get_h is non-differentiable since it uses k-nearest neighbour distances between data points to estimate the entropy. However, a small change in the data values can convert a (k)th nearest neighbour of a data point into a (k+1)th or (k-1)th nearest neighbour. As a result, the entropy estimates will not form a smooth and differentiable function of the data.

willleeney commented 3 years ago

Yes what I was trying to achieve is similar to what you have described; I had realised that the get_h function was non-differentiable due to the k-nearest neighbour distances so I was wondering if you knew of a way to estimate the k-nearest neighbour distances so that it could be differentiable? Are you saying that this is theoretically impossible, because unfortunately this was the only accurate way of estimating the entropy that I could find? It is very annoying because I could convert the rest of the function to torch to be differentiable..

paulbrodersen commented 3 years ago

Yeah, I don't think it's a valid approach as I can't see a way to approximate the estimator with a differentiable function. There are, however, analytic solutions for the entropy of probability distributions from the exponential family. If your data points can be reasonably approximated with any of those distributions, then you will have yourself a differentiable function. I have implemented the multivariate normal case (get_h_mvn, IIRC) but I am reasonably sure that there are solutions for all distributions from that family. If you can't find the formulae online, you will find them probably in Cover & Thomas, Elements of Information Theory.

All of that being said, if it's a regression problem, what is wrong with RMS as a cost function?

willleeney commented 3 years ago

The data points could be potentially be approximated with a multivariate normal so I implemented a differentiable version of get_h_mvn. However, the data points in question are close to zero so the calculation of the determinant tends to 0 which means the log calculation tends to nan.

It's not actually a regression problem, more of an unsupervised clustering problem so RMS isn't appropriate.

paulbrodersen commented 3 years ago

Entropy is a measure of dispersion, so if there is no spread then the entropy is by definition zero.

How do you have floating point targets if it is a clustering problem? Also, how is the algorithm unsupervised if you are using backprop to train it? If you want any more help, you will have to explain the problem with more detail.

willleeney commented 3 years ago

So my starting point was this paper: 'Unsupervised Deep Embedding for Clustering Analysis', but it relies on initial high confidence targets using k-means as a starting point. My problem is that the cluster centres that my model learns do not disperse enough, hence the initial paper's reliance on the high confidence targets. My idea was to minimize the inverse of the entropy across the centroids alongside a balance of clustering methods to enforce the centroids to disperse.

Is this enough information or is this not enough of an explanation? Thank you for the advice.

paulbrodersen commented 3 years ago

Yeah, that paper and your idea sketch help a lot. It's not a bad idea for a fix but I don't understand why the problem occurs in the first place. Specifically, I am a bit confused why there is no dispersion in the initialized cluster centers. It suggests that the representations in the last layer of the autoencoder are not very distinct. Have you checked that the autoencoder works well? If it does, maybe you need to reduce the number of units in the last layer to force each unit to have a wider range of activities. Also, enforcing sparsity in the output of the last layer should help.

Personally, I would start troubleshooting there. However, if you think that you have exhausted everything on that front, then maximising the entropy seems like a reasonable approach. However, I am unsure that the approaches we have discussed so far are appropriate, as they all assume a large number of samples. If I understand correctly, you are only interested in the dispersion of the cluster centers, which presumably will be few (how many clusters do you have?). I would use a simpler measure of dispersion: maybe something like the square root of the sum of distances between cluster centers.

willleeney commented 3 years ago

Thank you for the troubleshooting ideas, I had followed along these lines myself and I am in the process of implementing a beta-VAE to get better representations. The sparsity suggestion is something I had not thought of either.

Yes you understand perfectly, I am only interested in the dispersion of the cluster centres as you say. I think that using simpler measures of similarity to assess dispersion is the correct solution to the problem here. I was going to use a Cosine similarity however maybe the Euclidean distance would be best suited. I will try these out but I am sure that this is the solution to the original issue.

Thank you so much for your help with this, this has been very useful to discuss this with you!

paulbrodersen commented 3 years ago

Anytime, and good luck. Let me know if you get it to work!