pafoster / pyitlib

A library of information-theoretic methods for data analysis and machine learning, implemented in Python and NumPy.
MIT License
90 stars 17 forks source link

Entropy calculation for multivariate data? #3

Closed thgngu closed 5 years ago

thgngu commented 5 years ago

I was having a problem to compute (conditional) mutual information: I(X,Y) and I(X,Y|Z) since I don't have much statistic/information theory background. Then I found pyitlib. It beats all other methods that I have tried in both speed and accuracy.

But now I have a problem where a sample of X and Y is a scalar but that of Z is a vector. I have tried to read the documents but pyitlib seems to not support that, doesn't it? I think it all starts with the entropy_joint() calculation.

Would you please point me to some reference on how to solve this problem?

Thanks

pafoster commented 5 years ago

Thank you for your feedback!

To clarify, are you looking to obtain a scalar output (meaning that the observations in Z are vector-valued)? In that case, one possible approach would be to map each vector to an integer. Here are a few lines of numpy which accomplish that, assuming that samples are indexed by the 0th axis (admittedly the approach is a hack, in that it works by converting to strings):

import numpy as np
Z = np.array(((1, 2), (1, 1), (1, 1), (2, 2))) # 4 samples, each with 2 
components
Z = np.apply_along_axis(np.array_str, axis=1, arr=Z)
Z = np.searchsorted(np.sort(Z), Z)

As an aside, you would need to decide if the chosen estimation method is sufficient for your purposes: When you have many components, I would assume that the zero-frequency problem becomes substantial.

On Fri, 10 Aug 2018, thongnnguyen wrote:

I was having a problem to compute (conditional) mutual information: I(X,Y) and I(X,Y|Z) since I don't have much statistic/information theory background. Then I found pyitlib. It beats all other methods that I have tried in both speed and accuracy.

But now I have a problem where a sample of X and Y is a scalar but that of Z is a vector. I have tried to read the documents but pyitlib seems to not support that, doesn't it? I think it all starts with the entropy_joint() calculation.

Would you please point me to some reference on how to solve this problem?

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.[AJF5WK6xlALrc9r8fadoTOi0LXW79-oMks5uPYUJgaJpZM4V4Kcv.gif]

pafoster commented 5 years ago

Following my earlier response, I am now closing this issue. Feel free to open a new issue for any further queries.