Add base argument to entropy, MI and PMI estimators and simplify PMI calculation

mahlzahn commented 12 months ago

Modifications

add base argument to estimators to allow for entropy, MI and PMI measures to be in e.g., bits instead of nats
simplify PMI MVN calculation by re-using estimates from entropy function
tiny adjustments of documentation, e.g., I(X,Y) -> I(X;Y)
add .gitignore

Comments

The base argument is not strictly necessarily needed as get_mi_mvn(x, y) / np.log(2) and get_mi_mvn(x, y, base=2) yield the same output in the end.
I did not add the base argument to the Imin and PID estimators, yet. Is this mathematically correct to do so for these, too?

paulbrodersen commented 12 months ago

Hi, thanks for the PR! Just a heads up that I am completely swamped at the moment due to an ambitious deadline in 10 days time, and that I might need a week (or two) to review, give feedback, and merge. But I do appreciate the PR and I will review it as soon as possible.

paulbrodersen commented 11 months ago

Hi, thanks for being patient with me. I was in the process of submitting a paper and the journal editor requested a number of changes after we had addressed all reviewer requests. This is rather unusual and hence unexpected, which is why it took me so long to free my desk again.

Now to your PR:

add base argument to estimators to allow for entropy, MI and PMI measures to be in e.g., bits instead of nats

I think you are absolutely correct that the library should support different bases. I do have some opinions though on the implementation, and would like to request a few changes before I merge the PR.

I would rename base to logarithmic_base, simply because it is more specific and can't be understood to be anything else than the base of the logarithm.
I strongly prefer explicit arguments (at least nowadays), so I would prefer the logarithmic_base parameter to initialized to np.e instead of None, i.e:

def func(arg, logarithmic_base=np.e):
    ...

This would also simplify the conversion function to:

def convert_to_log_with_base(value, base):
    if base != np.e:
        return value / np.log(base)
    else:
        return value

I did not add the base argument to the Imin and PID estimators, yet. Is this mathematically correct to do so for these, too?

It should be as they are also in units of nats.

simplify PMI MVN calculation by re-using estimates from entropy function

All those changes look good to me.

tiny adjustments of documentation, e.g., I(X,Y) -> I(X;Y)

The corrections are much appreciated.

add .gitignore

Sure, why not.

Let me know what you think about these suggestions, and if you want to make the changes.

mahlzahn commented 11 months ago

Don’t worry. I know very well what you’re talking about ;)

I followed the implementation of scipy.stats.entropy. And I don’t know if there are any drawbacks of giving np.e directly as default argument. But if you prefer func(..., logarithmic_base=np.e) to func(..., base=None), I will adjust the PR for that.

paulbrodersen commented 11 months ago

I followed the implementation of scipy.stats.entropy.

That is a strong argument for leaving it as it is. I will merge now then. Thank you for the PR.

paulbrodersen / entropy_estimators

Add base argument to entropy, MI and PMI estimators and simplify PMI calculation #19

Modifications

Comments