Open mahlzahn opened 11 months ago
Add argument
normalized
Could you expand a bit on the motivation, or provide some references and/or applications?
Fix floating point issue. The current implementation fails to calculate the entropy properly of highly correlated variables because of float resolution.
Much appreciated.
Speed-up of MVN entropy estimate for 1D variables by using the variance instead of the covariance matrix calculation.
Did you time it? Since both implementations ultimately rely on LAPACK/OpenBLAS, I would be shocked if the difference was substantial (> 1.5x).
Speed-up of MVN entropy estimate for 1D variables by using the variance instead of the covariance matrix calculation.
Did you time it? Since both implementations ultimately rely on LAPACK/OpenBLAS, I would be shocked if the difference was substantial (> 1.5x).
~2 times in my tests. As I am running entropy for thousands of variables or pairs, I’d say it matters (a bit) ;)
Speed-up of MVN entropy estimate for 1D variables by using the variance instead of the covariance matrix calculation.
Did you time it? Since both implementations ultimately rely on LAPACK/OpenBLAS, I would be shocked if the difference was substantial (> 1.5x).
~2 times in my tests. As I am running entropy for thousands of variables or pairs, I’d say it matters (a bit) ;)
Alright, I hate the increase in code complexity but we don't leave factors of two on the table.
When you have time, could you expand a bit on the motivation for the normalization, or provide some references and/or applications? I don't want to support something even I don't understand. ;-)
1. Add agument
normalized
Add an argument
normalized
to theget_h_mvn
function which returns the entropy of the normalized MVN distribution by normalizing such that its variance is 1 and the covariance matrix becomes equal to the Pearson correlation coefficients. Thus, the entropy becomes invariant under (some) linear transformation (scalar multiplication).calculates the entropy
H
and the normalized entropyH’
for two distributions a and b and a third is c=5a+10, etc.:Thus, the normalized entropy of a MVN random variable
X
with dimensiond
is equal toThis is also the maximum normalized entropy for a
d
-dimensional variable. It is lower if the components are correlated, e.g., in the case of rotated 2D MVN random variable (see table above).2. Fix floating point issue
The current implementation fails to calculate the entropy properly of highly correlated variables because of float resolution. I fixed this by returning
-inf
if the determinant of the Pearson correlation coefficients matrix equals 0 andnan
if the determinant is close to 0 (|det(…)|<10⁻¹³). The last three columns of above table demonstrate the new behaviour. The entropy of[a a+b/1e5]
is-7.99
, of[a a+b/1e9]
isnan
and of[a a]
is-inf
, indicating that the second one cannot be calculated.3. Speed-up of MVN entropy estimate for 1D variables
… by using the variance instead of the covariance matrix calculation